[ 
https://issues.apache.org/jira/browse/CASSANDRA-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885069#comment-15885069
 ] 

Stefania commented on CASSANDRA-13071:
--------------------------------------

Thanks for the review!

bq. Is this supposed to be working or are we supposed to use type-based 
delimiters

It's actually supposed to be working, but I'm not entirely sure about it. I've 
left a comment 
[here|https://github.com/stef1927/cassandra/commit/170e21fa3d8da8661a4cd500c3507d3919717eff#diff-27e394435c04a60c58ec9d5c34397341R1889].
 Basically, I wasn't sure about enforcing the correct type parenthesis in order 
to avoid breaking data that so far could be imported, albeit data that is 
incorrect CQL. On the flip side, we could, for example, incorrectly convert a 
list to a set and vice-versa, if two columns are swapped by mistake. Missing 
columns should be detected by the check on the total number of columns, so I am 
not too worried about collections being converted to another collection type 
due to missing columns. Perhaps we should just enforce type parenthesis in 4.0?

bq. The dtests results look good, but it seems they were not triggered using 
the new dtest branch you created so I re-triggered them using your branch.

I didn't realize we could specify a dtest branch for the cqlsh tests, thanks 
for relaunching them. The results are clean for 3.0 and 3.11, but there was a 
problem for trunk: CASSANDRA-10520 broke the clqshlib tests, and this caused 
the entire job to fail. I've ninja fixed this, rebased and relaunched.

> cqlsh copy-from should error out when csv contains invalid data for 
> collections
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13071
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13071
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Stefania
>            Assignee: Stefania
>            Priority: Minor
>             Fix For: 3.0.x, 3.11.x
>
>
> If the csv file contains invalid data for collection types, at the moment the 
> data is imported incorrectly, an error would be a better behavior.
> For example this table:
> {code}
> CREATE TABLE test.test (key text, value frozen<set<text>>, PRIMARY KEY 
> (key)); 
> {code}
> with this data:
> {code}
> "key1","{'test1', 'test2'}"
> "Key2","not_a_set"
> {code}
> will be imported by {{COPY test.test FROM 'test.csv';}} without errors but 
> will result in the following data:
> {code}
> cqlsh> select * from test.test;
>  key  | value
> ------+--------------------
>  key1 | {'test1', 'test2'}
>  Key2 |        {'ot_a_se'}
> (2 rows)
> {code}
> The second row should have been rejected. The reason is that the [{{split}} 
> function|https://github.com/stef1927/cassandra/blob/trunk/pylib/cqlshlib/copyutil.py#L1898]
>  assumes that the first and last characters of the string passed in are 
> parentheses, without actually checking it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to