[ https://issues.apache.org/jira/browse/CASSANDRA-17831 ]
Brad Schoening deleted comment on CASSANDRA-17831:
--------------------------------------------
was (Author: bschoeni):
Let's benchmark it. I'll run some tests with moderate to large data sets.
Based upon the
[Radečić|https://medium.com/@radecicdario?source=post_page-----72c78a414d1d--------------------------------]
article, he saw 80% reduction in disk space and 33X performance boost with
parquet. Of course, performance with Cassandra involves the DB latency as well
so I'm not expecting performance to be as dramatic.
I'm on vacation for the next few weeks, but will run some tests upon my return.
> Add support in CQLSH for COPY FROM / TO in compact Parquet format
> -----------------------------------------------------------------
>
> Key: CASSANDRA-17831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17831
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Tool/cqlsh
> Reporter: Brad Schoening
> Priority: Normal
>
> CQL supports only CSV as a format for import and export. A binary big data
> format such as Avro and/or Parquet would be more compact and highly portable
> to other platforms.
> Parquet does not require a schema, so it appears the easier format to support.
> The existing syntax supports adding key value pair options, such as FORMAT =
> PARQUET
> {{ COPY table_name ... FROM 'file_name'[, 'file2_name', ...] }}
> {{[WITH option = 'value' [AND ...]]}}
> Side by side comparisons of CSV and Parquet show a 80% plus saving in disk
> space.
> [https://towardsdatascience.com/csv-files-for-storage-no-thanks-theres-a-better-option-72c78a414d1d]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]