(asterixdb) branch master updated: [ASTERIXDB-3384][DOC] Document COPY

wyk Fri, 03 May 2024 08:33:49 -0700

This is an automated email from the ASF dual-hosted git repository.

wyk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/asterixdb.git



The following commit(s) were added to refs/heads/master by this push:
     new 80339457a5 [ASTERIXDB-3384][DOC] Document COPY
80339457a5 is described below

commit 80339457a55f89c4a9b9217d17289fc6c4db18de
Author: Ian Maxon <[email protected]>
AuthorDate: Fri Apr 19 02:37:53 2024 -0700

    [ASTERIXDB-3384][DOC] Document COPY
    
    Details:
    Add some brief documentation about each COPY
    statement. Also add the simplified BNF for railroad
    diagrams for each.
    
    Change-Id: Ibdacf4e6b156a3b6ef15b1420a4102c122f8bf1c
    Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18243
    Reviewed-by: Ian Maxon <[email protected]>
    Reviewed-by: Wail Alkowaileet <[email protected]>
    Integration-Tests: Jenkins <[email protected]>
    Tested-by: Jenkins <[email protected]>
---
 asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf  |  5 ++
 .../src/main/markdown/sqlpp/7_ddl_dml.md           | 74 ++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf 
b/asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf
index af67e33e00..cbfa92e5a5 100644
--- a/asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf
+++ b/asterixdb/asterix-doc/src/main/grammar/sqlpp.ebnf
@@ -258,4 +258,9 @@ InsertStmnt ::= "INSERT" "INTO" QualifiedName ("AS" 
Variable)? Query ("RETURNING
 
 UpsertStmnt ::= "UPSERT" "INTO" QualifiedName ("AS" Variable)? Query 
("RETURNING" Expr)?
 
+
+CopyStmnt ::= "COPY" "INTO"? QualifiedName ("AS" Variable)? "FROM" Identifier 
"AT" QualifiedName ("PATH" StringLiteral)? (WITH ObjectConstructor)?
+
+CopyToStmnt ::= "COPY" ( QualifiedName | "(" Query ")" ) "TO" AdapterName 
"PATH" ParenthesizedArrayConstructor ("OVER" "(" ("PARTITION" "BY" Expr ("AS" 
Variable)? ("," Expr ("AS" Variable)? )? )? OrderbyClause ")" )? WITH 
ObjectConstructor
+
 DeleteStmnt ::= "DELETE" "FROM" QualifiedName (("AS")? Variable)? ("WHERE" 
Expr)?
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md 
b/asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md
index 4ec5033ae6..71e6af5a31 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/7_ddl_dml.md
@@ -697,3 +697,77 @@ The following examples illustrate single-object deletions.
 
     DELETE FROM customers WHERE custid = "C47";
 
+### <a id="Copy">Copy Statement</a>
+
+##### CopyStmnt
+![](../images/diagrams/CopyStmnt.png)
+
+The `COPY` statement is used to load data in bulk from an external source into 
a dataset. It differs from `LOAD` in that
+it can be performed multiple times, to upsert new data into the dataset that 
may have changed in the source.
+
+For example, this statement would copy the contents of the JSON file 
`customerData.json` residing on the NC named `asterix_nc1`
+into the Customers dataset.
+
+##### Example
+
+    COPY Customers
+    FROM localfs
+    PATH ("asterix_nc1://data/nontagged/customerData.json")
+    WITH {
+    "format": "json"
+    };
+
+
+### <a id="CopyTo">Copy To Statement</a>
+
+##### CopyToStmnt
+![](../images/diagrams/CopyToStmnt.png)
+
+The `COPY TO` statement allows easy export of the result of a query, or an 
entire dataset, into a file or set of files on
+an external source. This can be any source that has an adapter. 
+
+For example, this statement would create a copy of `ColumnDataset` on an S3 
bucket, myBucket:
+
+##### Example
+
+    COPY ColumnDataset
+    TO s3
+    PATH("CopyToResult/")
+    WITH {
+    "format" : "json"
+    "container": "myBucket",
+    "accessKeyId": "<access-key>",
+    "secretAccessKey": "<secret-key>",
+    "region": "us-west-2"
+    };
+
+The statement allows for much more precise exports than this, however. A 
typical pattern of data accessed via object stores
+like S3 is for it to be partitioned into files, with each folder containing 
some of those files representing a key. The use
+of `OVER` and `PARTITION BY` allow exports to match this. For example:
+
+##### Example
+
+    COPY (SELECT cd.uid uid,
+    cd.sensor_info.name name,
+    to_bigint(cd.sensor_info.battery_status) battery_status
+    FROM ColumnDataset cd
+    ) toWrite
+    TO s3
+    PATH("CopyToResult", to_string(b))
+    OVER (
+    PARTITION BY toWrite.battery_status b
+    ORDER BY toWrite.name
+    )
+    WITH {
+    "format" : "json",
+    "compression": "gzip",
+    "max-objects-per-file": 1000,
+    "container": "myBucket",
+    "accessKeyId": "<access-key>",
+    "secretAccessKey": "<secret-key>",
+    "region": "us-west-2"
+    };
+
+This query will be exported as partitions into a set of folders, with one 
folder for each value of `battery_status`. 
+Each partition itself will also be sorted by the `name` field, and compressed 
with `gzip` and divided into files of 100
+objects or fewer per file.

(asterixdb) branch master updated: [ASTERIXDB-3384][DOC] Document COPY

Reply via email to