[
https://issues.apache.org/jira/browse/CASSANDRA-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387295#comment-14387295
]
Sebastian Estevez commented on CASSANDRA-9048:
----------------------------------------------
Will the new COPY command be back ported or made compatible with the 2.0 branch?
> Delimited File Bulk Loader
> --------------------------
>
> Key: CASSANDRA-9048
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9048
> Project: Cassandra
> Issue Type: Improvement
> Components: Tools
> Reporter: Brian Hess
> Attachments: CASSANDRA-9048.patch
>
>
> There is a strong need for bulk loading data from delimited files into
> Cassandra. Starting with delimited files means that the data is not
> currently in the SSTable format, and therefore cannot immediately leverage
> Cassandra's bulk loading tool, sstableloader, directly.
> A tool supporting delimited files much closer matches the format of the data
> more often than the SSTable format itself, and a tool that loads from
> delimited files is very useful.
> In order for this bulk loader to be more generally useful to customers, it
> should handle a number of options at a minimum:
> - support specifying the input file or to read the data from stdin (so other
> command-line programs can pipe into the loader)
> - supply the CQL schema for the input data
> - support all data types other than collections (collections is a stretch
> goal/need)
> - an option to specify the delimiter
> - an option to specify comma as the decimal delimiter (for international use
> casese)
> - an option to specify how NULL values are specified in the file (e.g., the
> empty string or the string NULL)
> - an option to specify how BOOLEAN values are specified in the file (e.g.,
> TRUE/FALSE or 0/1)
> - an option to specify the Date and Time format
> - an option to skip some number of rows at the beginning of the file
> - an option to only read in some number of rows from the file
> - an option to indicate how many parse errors to tolerate
> - an option to specify a file that will contain all the lines that did not
> parse correctly (up to the maximum number of parse errors)
> - an option to specify the CQL port to connect to (with 9042 as the default).
> Additional options would be useful, but this set of options/features is a
> start.
> A word on COPY. COPY comes via CQLSH which requires the client to be the
> same version as the server (e.g., 2.0 CQLSH does not work with 2.1 Cassandra,
> etc). This tool should be able to connect to any version of Cassandra
> (within reason). For example, it should be able to handle 2.0.x and 2.1.x.
> Moreover, CQLSH's COPY command does not support a number of the options
> above. Lastly, the performance of COPY in 2.0.x is not high enough to be
> considered a bulk ingest tool.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)