[jira] [Updated] (CASSANDRA-17831) Add support in CQLSH for COPY FROM / TO in compact Parquet format

Brad Schoening (Jira) Wed, 17 Aug 2022 16:44:07 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-17831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Brad Schoening updated CASSANDRA-17831:
---------------------------------------
    Description: 
CQL supports only CSV as a format for import and export. A binary big data 
format such as Avro and/or Parquet would be more compact and highly portable to 
other platforms.

Parquet does not require a schema, so it appears the easier format to support.

The existing syntax supports adding key value pair options, such as FORMAT = 
PARQUET

{{     COPY table_name ... FROM 'file_name'[, 'file2_name', ...] }}

                     {{[WITH option = 'value' [AND ...]]}}

Side by side comparisons of CSV and Parquet show a 80% plus saving in disk 
space.

[https://towardsdatascience.com/csv-files-for-storage-no-thanks-theres-a-better-option-72c78a414d1d]

  was:
CQL supports only CSV as a format for import and export. A binary big data 
format such as Avro and/or Parquet would be more compact and highly portable to 
other platforms.

Parquet does not require a schema, so it appears the easier format to support.

The existing syntax supports adding key value pair options, such as FORMAT = 
PARQUET

{{     COPY table_name ... FROM 'file_name'[, 'file2_name', ...] }}

                     {{{}{}}}{{{}[WITH option = 'value' [AND ...]]{}}}

{{{}{}}}Side by side comparisons of CSV and Parquet show a 80% plus saving in 
disk space.

https://towardsdatascience.com/csv-files-for-storage-no-thanks-theres-a-better-option-72c78a414d1d


> Add support in CQLSH for COPY FROM / TO in compact Parquet format
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-17831
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17831
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Brad Schoening
>            Assignee: Brad Schoening
>            Priority: Normal
>
> CQL supports only CSV as a format for import and export. A binary big data 
> format such as Avro and/or Parquet would be more compact and highly portable 
> to other platforms.
> Parquet does not require a schema, so it appears the easier format to support.
> The existing syntax supports adding key value pair options, such as FORMAT = 
> PARQUET
> {{     COPY table_name ... FROM 'file_name'[, 'file2_name', ...] }}
>                      {{[WITH option = 'value' [AND ...]]}}
> Side by side comparisons of CSV and Parquet show a 80% plus saving in disk 
> space.
> [https://towardsdatascience.com/csv-files-for-storage-no-thanks-theres-a-better-option-72c78a414d1d]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17831) Add support in CQLSH for COPY FROM / TO in compact Parquet format

Reply via email to