>From Ian Maxon <[email protected]>: Ian Maxon has submitted this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18243 )
Change subject: [ASTERIXDB-3384][DOC] Document COPY ...................................................................... [ASTERIXDB-3384][DOC] Document COPY Details: Add some brief documentation about each COPY statement. Also add the simplified BNF for railroad diagrams for each. Change-Id: Ibdacf4e6b156a3b6ef15b1420a4102c122f8bf1c Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18243 Reviewed-by: Ian Maxon <[email protected]> Reviewed-by: Wail Alkowaileet <[email protected]> Integration-Tests: Jenkins <[email protected]> Tested-by: Jenkins <[email protected]> --- M asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf M asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md 2 files changed, 98 insertions(+), 0 deletions(-) Approvals: Ian Maxon: Looks good to me, but someone else must approve Wail Alkowaileet: Looks good to me, approved Jenkins: Verified; Verified diff --git a/asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf b/asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf index af67e33..cbfa92e 100644 --- a/asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf +++ b/asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf @@ -258,4 +258,9 @@ UpsertStmnt ::= "UPSERT" "INTO" QualifiedName ("AS" Variable)? Query ("RETURNING" Expr)? + +CopyStmnt ::= "COPY" "INTO"? QualifiedName ("AS" Variable)? "FROM" Identifier "AT" QualifiedName ("PATH" StringLiteral)? (WITH ObjectConstructor)? + +CopyToStmnt ::= "COPY" ( QualifiedName | "(" Query ")" ) "TO" AdapterName "PATH" ParenthesizedArrayConstructor ("OVER" "(" ("PARTITION" "BY" Expr ("AS" Variable)? ("," Expr ("AS" Variable)? )? )? OrderbyClause ")" )? WITH ObjectConstructor + DeleteStmnt ::= "DELETE" "FROM" QualifiedName (("AS")? Variable)? ("WHERE" Expr)? diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md index 4ec5033..71e6af5 100644 --- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md +++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md @@ -697,3 +697,77 @@ DELETE FROM customers WHERE custid = "C47"; +### <a id="Copy">Copy Statement</a> + +##### CopyStmnt + + +The `COPY` statement is used to load data in bulk from an external source into a dataset. It differs from `LOAD` in that +it can be performed multiple times, to upsert new data into the dataset that may have changed in the source. + +For example, this statement would copy the contents of the JSON file `customerData.json` residing on the NC named `asterix_nc1` +into the Customers dataset. + +##### Example + + COPY Customers + FROM localfs + PATH ("asterix_nc1://data/nontagged/customerData.json") + WITH { + "format": "json" + }; + + +### <a id="CopyTo">Copy To Statement</a> + +##### CopyToStmnt + + +The `COPY TO` statement allows easy export of the result of a query, or an entire dataset, into a file or set of files on +an external source. This can be any source that has an adapter. + +For example, this statement would create a copy of `ColumnDataset` on an S3 bucket, myBucket: + +##### Example + + COPY ColumnDataset + TO s3 + PATH("CopyToResult/") + WITH { + "format" : "json" + "container": "myBucket", + "accessKeyId": "<access-key>", + "secretAccessKey": "<secret-key>", + "region": "us-west-2" + }; + +The statement allows for much more precise exports than this, however. A typical pattern of data accessed via object stores +like S3 is for it to be partitioned into files, with each folder containing some of those files representing a key. The use +of `OVER` and `PARTITION BY` allow exports to match this. For example: + +##### Example + + COPY (SELECT cd.uid uid, + cd.sensor_info.name name, + to_bigint(cd.sensor_info.battery_status) battery_status + FROM ColumnDataset cd + ) toWrite + TO s3 + PATH("CopyToResult", to_string(b)) + OVER ( + PARTITION BY toWrite.battery_status b + ORDER BY toWrite.name + ) + WITH { + "format" : "json", + "compression": "gzip", + "max-objects-per-file": 1000, + "container": "myBucket", + "accessKeyId": "<access-key>", + "secretAccessKey": "<secret-key>", + "region": "us-west-2" + }; + +This query will be exported as partitions into a set of folders, with one folder for each value of `battery_status`. +Each partition itself will also be sorted by the `name` field, and compressed with `gzip` and divided into files of 100 +objects or fewer per file. -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18243 To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Change-Id: Ibdacf4e6b156a3b6ef15b1420a4102c122f8bf1c Gerrit-Change-Number: 18243 Gerrit-PatchSet: 6 Gerrit-Owner: Ian Maxon <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Hussain Towaileb <[email protected]> Gerrit-Reviewer: Ian Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Peeyush Gupta <[email protected]> Gerrit-Reviewer: Wail Alkowaileet <[email protected]> Gerrit-MessageType: merged
