[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM

Stefania (JIRA) Thu, 17 Dec 2015 08:43:14 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062302#comment-15062302
 ]


Stefania commented on CASSANDRA-9303:
-------------------------------------



bq. I'd suggest the following \[copy(:ks.table)\] (global and per-table copy 
(to and from) options), \[copy-from(:ks.table)\] (global and per-table 
copy-from options), \[copy-to(:ks.table)\] (global and per-table copy-to 
options) where (:ks.table) is optional. so you can have \[copy\], \[copy-to\], 
\[copy-from\], \[copy-to:ks.table\], \[copy-from:ks.table\].

Done, I cleaned up the options a bit as well and removed the helper methods in 
the main cqlsh files.

bq. maybe we could just add an unique suffix to avoid appending to an existing 
file from a previous execution?

If a file from a previous execution exists it will be ranamed to 
.YYYYMMDD_HHMMSS.

bq. We can address if it won't take too much time, otherwise we can address it 
separately. Can we maybe improve it by making batchsize adaptive = 
min(batchsize, ingest_rate - current_record) or something more complicated will 
be needed?

Done, adaptive chunk size and retries needed changing.
 
bq. Move SKIPCOLS to COPY_COMMON_OPTIONS since it can be used in both copy-to 
and copy-from.
    
Actually it should be a COPY FROM only option, see more below.

bq. Regarding the beahvior of SKIPCOLS with COPY FROM, right now it only 
supports having fewer columns in the CSV. Should we also support actually 
skipping columns in the CSV even if they are present?
        
I think the sematic I chose, to use SKIPCOLS to subtract from the set of 
columns specified in the command line, is not as advantageous as the ability to 
skip columns in the file. Providing both features with the same option would be 
confusing. So, I converted SKIPCOLS to a COPY FROM option and changed its 
semantic to just skip columns that exist in the file. If in future the need 
arises to specify "all columns except" in the command line, we can introduce a 
regex like extression (^col_name) in the columns part of the COPY cmd.

bq. Another related feature to have in the future would be to pick only 
specific columnms from the csv and allowing custom orderings of columns, but we 
can leave that for later if there's a need.

I think reordering columns is not as useful as skipping them so I tend to agree 
to leave this as a future development if the need arises.

bq. After those are addressed you can probably start making 2.2+ patches.

I changed a lot of code today and I've run out of time anyway, so I'll wait for 
one more round of review before up-merging.


> Match cassandra-loader options in COPY FROM
> -------------------------------------------
>
>                 Key: CASSANDRA-9303
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9303
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Critical
>             Fix For: 2.1.x
>
>
> https://github.com/brianmhess/cassandra-loader added a bunch of options to 
> handle real world requirements, we should match those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM

Reply via email to