[
https://issues.apache.org/jira/browse/CASSANDRA-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefania updated CASSANDRA-14391:
---------------------------------
Issue Type: Improvement (was: Bug)
> COPY FROM ignores headers
> -------------------------
>
> Key: CASSANDRA-14391
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14391
> Project: Cassandra
> Issue Type: Improvement
> Components: CQL
> Environment: cqlsh 5.0.1 and Cassandra 3.11.2 on macOS 10.13.2.
> Reporter: M. Justin
> Priority: Major
>
> COPY FROM appears to ignore the headers value, even when "headers = true" is
> specified. This means that if the columns are reordered, the import process
> will save values in the wrong columns. Additionally, if there are missing
> columns, an error occurs, even if those columns are not primary key columns.
> This behavior contradicts the behavior [specified in the
> docs|https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshCopy.html]
> (emphasis mine).
> {quote}COPY FROM imports data from a CSV file into an existing table. Each
> line in the source file is imported as a row. All rows in the dataset must
> contain the same number of fields and have values in the PRIMARY KEY fields.
> The process verifies the PRIMARY KEY and updates existing records. If
> {{HEADER = false}} and no column names are specified, the fields are imported
> in deterministic order. *When column names are specified, fields are imported
> in that order. Missing and empty fields are set to null. The source cannot
> have more fields than the target table, however it can have fewer fields.*
> {quote}
> h2. Example
> {noformat:title=temp.csv}
> col2,col1,col3
> column value 1,key2,3
> column value 2,key4,3
> column value 3,key3,3
> column value 4,key1,3
> {noformat}
> {code:sql}
> create keyspace copy_to_from_test WITH replication = { 'class' :
> 'SimpleStrategy', 'replication_factor' : 1 };
> use copy_to_from_test;
> create table test_table (col1 text primary key, col2 text, col3 bigint);
> copy test_table from 'temp.csv' with header = true;
> {code}
> The above code will incorrectly swap the "col2" and "col1" values, since it
> expects the first column to be "col1". If I had instead swapped the order of
> "col3", I would have received an error on input, as it would have attempted
> to store text in a numerical column.
> h2. Expected Behavior
> I would expect specifying "with header = true" on a COPY FROM statement to
> use the headers as column names for insertion, rather than merely skipping
> the header row. Missing non-primary key columns should be set to null.
> h2. Other
> I ran across this issue when copying between two of my environments. One of
> the environments had changed the columns in the primary key, but the other
> had not yet. This caused the order of the columns to vary between the
> environments.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]