>From Ian Maxon <[email protected]>:
Ian Maxon has uploaded this change for review. (
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18243 )
Change subject: [ASTERIXDB-3384][DOC] Document COPY
......................................................................
[ASTERIXDB-3384][DOC] Document COPY
Details:
Add some brief documentation about each COPY
statement. Also add the simplified BNF for railroad
diagrams for each.
Change-Id: Ibdacf4e6b156a3b6ef15b1420a4102c122f8bf1c
---
M asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf
M asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md
2 files changed, 88 insertions(+), 0 deletions(-)
git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb
refs/changes/43/18243/1
diff --git a/asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf
b/asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf
index af67e33..31df730 100644
--- a/asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf
+++ b/asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf
@@ -258,4 +258,8 @@
UpsertStmnt ::= "UPSERT" "INTO" QualifiedName ("AS" Variable)? Query
("RETURNING" Expr)?
+CopyStmnt ::= "COPY" "INTO"? QualifiedName ("AS" Variable)? "FROM" Identifier
"AT" QualifiedName ("PATH" StringLiteral)? (WITH ObjectConstructor)?
+
+CopyToStmnt ::= "COPY" ( QualifiedName | "(" Query ")" ) "TO" AdapterName
"PATH" ParenthesizedArrayConstructor ("OVER" "(" ("PARTITION" "BY" Expr ("AS"
Variable)? ("," Expr ("AS" Variable)? )? )? OrderbyClause ")" )?
+
DeleteStmnt ::= "DELETE" "FROM" QualifiedName (("AS")? Variable)? ("WHERE"
Expr)?
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md
b/asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md
index 4ec5033..2222a7a 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md
@@ -697,3 +697,73 @@
DELETE FROM customers WHERE custid = "C47";
+### <a id="Copy">Copy Statement</a>
+
+##### CopyStmnt
+
+
+The `COPY` statement is used to load data in bulk from an external source into
a dataset. It differs from `LOAD` in that
+it can be performed multiple times, to upsert new data into the dataset that
may have changed in the source.
+
+For example, this statement would copy the contents of the JSON file
`customerData.json` residing on the NC named `asterix_nc1`
+into the Customers dataset.
+
+##### Example
+
+ COPY Customers
+ FROM localfs
+ PATH ("asterix_nc1://data/nontagged/customerData.json")
+ WITH {
+ "format": "json"
+ };
+
+
+### <a id="CopyTo">Copy To Statement</a>
+
+##### CopyToStmnt
+
+
+The `COPY TO` statement allows easy export of the result of a query, or an
entire dataset, into a file or set of files on
+an external source. This can be any source that has an adapter.
+
+For example, this statement would create a copy of `ColumnDataset` on each
node as a single JSON file
+
+##### Example
+
+ COPY ColumnDataset
+ TO localfs
+ PATH("localhost:///media/backup/CopyToResult")
+ WITH {
+ "format" : "json"
+ };
+
+The statement allows for much more precise exports than this, however. A
typical pattern of data accessed via object stores
+like S3 is for it to be partitioned into files, with each folder containing
some of those files representing a key. The use
+of `OVER` and `PARTITION BY` allow exports to match this. For example:
+
+##### Example
+
+ COPY (SELECT cd.uid uid,
+ cd.sensor_info.name name,
+ to_bigint(cd.sensor_info.battery_status) battery_status
+ FROM ColumnDataset cd
+ ) toWrite
+ TO s3
+ PATH("CopyToResult/" || to_string(b))
+ OVER (
+ PARTITION BY toWrite.battery_status b
+ ORDER BY toWrite.name
+ )
+ WITH {
+ "format" : "json",
+ "compression": "gzip",
+ "max-objects-per-file": 100,
+ "container": "myBucket",
+ "accessKeyId": "<access-key>",
+ "secretAccessKey": "<secret-key>",
+ "region": "us-west-2"
+ };
+
+This query will be exported as partitions into a set of folders, with one
folder for each value of `battery_status`.
+Each partition itself will also be sorted by the `name` field, and compressed
with `gzip` and divided into files of 100
+objects or fewer per file.
--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18243
To unsubscribe, or for help writing mail filters, visit
https://asterix-gerrit.ics.uci.edu/settings
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: Ibdacf4e6b156a3b6ef15b1420a4102c122f8bf1c
Gerrit-Change-Number: 18243
Gerrit-PatchSet: 1
Gerrit-Owner: Ian Maxon <[email protected]>
Gerrit-MessageType: newchange