Agreed (+1) - this functionality will significantly reduce the pain of moving data in and out of object storage.
> On Oct 25, 2023, at 10:04 PM, Mike Carey <dtab...@gmail.com> wrote: > > +1 -- Now maybe users will stop trying to retrieve huge results and > wondering why the UI is choking! :-) This capability is actually long overdue. > > On 10/24/23 9:53 AM, Wail Alkowaileet wrote: >> Currently, AsterixDB does not have a clean way to extract query results or >> dump a dataset to a storage device. The only channel provided currently is >> the Query Service (i.e., running the query and writing it somehow at the >> client side). We need to support a way to write query results (or dump a >> dataset) in parallel to a storage device. >> >> To illustrate, say we want to do the following: >> >>> USE CopyToDataverse; >> COPY ColumnDataset >>> TO localfs >>> PATH("localhost:///media/backup/CopyToResult") >>> WITH { >>> "format" : "json" >>> }; >> In this example, the data in ColumnDataset will be written in each node at >> the provided path localhost:///media/backup/CopyToResult. Simply, each node >> will write its own partitions for the data stored in ColumnDataset locally. >> The written files will be in raw JSON format. >> >> Another example: >> >>> USE CopyToDataverse; >>> COPY (SELECT cd.uid uid, >>> cd.sensor_info.name name, >>> to_bigint(cd.sensor_info.battery_status) >>> battery_status >>> FROM ColumnDataset cd >>> ) toWrite >>> TO s3 >>> PATH("CopyToResult/" || to_string(b)) >>> OVER ( >>> PARTITION BY toWrite.battery_status b >>> ORDER BY toWrite.name >>> ) >>> WITH { >>> "format" : "json", >>> "compression": "gzip", >>> "max-objects-per-file": 100, >>> "container": "myBucket", >>> "accessKeyId": "<access-key>", >>> "secretAccessKey": "<secret-key>", >>> "region": "us-west-2" >>> }; >> The second example shows how to write the result of a query and also >> partition the result so that each partition will be written to a certain >> path. In this example, we partition by the battery_status (say an integer >> value from 0 to 100). The final result will be written to myBucke in Amazon >> S3. >> >> Each partition will have the path CopyToResult/<battery_status>. For >> example CopyToResult/0, CopyToResult/1 ..., CopyToResult/99, >> CopyToResult/100). This partitioning scheme can be useful if a user wants >> to exploit dynamic prefixes (external filters) (see ASTERIXDB-3073 >> <https://issues.apache.org/jira/browse/ASTERIXDB-3073>). >> >> Additionally, the records in each partition will be ordered by the >> sensor_name (toWrite.name). Note that this ordering isn't global but per >> partition. >> >> Also, the written files will be compressed using *gzip* and each file >> should have at most 100 records max (*max-objects-per-file*). >> >> EPIC: ASTERIXDB-3286<https://issues.apache.org/jira/browse/ASTERIXDB-3286>