[ 
https://issues.apache.org/jira/browse/CASSANDRA-17831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583176#comment-17583176
 ] 

David Capwell commented on CASSANDRA-17831:
-------------------------------------------

bq. My understanding is that parquet allows but does not require a schema, and 
the schema-less example above runs.

Your example has a schema... n_legs of type int, animal of type string...

bq.  It's feasible, but more complex to write a format which requires building 
a schema translation and upon import, validates the same.

Most data processing tools handle this and hide this from you.  If you use 
pandas or spark to work with a Parquet file it will find the schema and load it 
and validate your actions against it.

bq. Apache Arrow will export python table / dataframe to parquet:

It allows export to common columnar formats, but does not require.  My point 
with Arrow was not to argue that we should use that instead, but to argue that 
different solutions may be better based off the use case, so without more 
details on the use case its hard to say which is best.

> Add support in CQLSH for COPY FROM / TO in compact Parquet format
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-17831
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17831
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tool/cqlsh
>            Reporter: Brad Schoening
>            Assignee: Brad Schoening
>            Priority: Normal
>
> CQL supports only CSV as a format for import and export. A binary big data 
> format such as Avro and/or Parquet would be more compact and highly portable 
> to other platforms.
> Parquet does not require a schema, so it appears the easier format to support.
> The existing syntax supports adding key value pair options, such as FORMAT = 
> PARQUET
> {{     COPY table_name ... FROM 'file_name'[, 'file2_name', ...] }}
>                      {{[WITH option = 'value' [AND ...]]}}
> Side by side comparisons of CSV and Parquet show a 80% plus saving in disk 
> space.
> [https://towardsdatascience.com/csv-files-for-storage-no-thanks-theres-a-better-option-72c78a414d1d]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to