[jira] [Created] (CASSANDRA-12079) CQLSH to retrieve column names from data file header

Jose Martinez Poblete (JIRA) Thu, 23 Jun 2016 08:46:52 -0700

Jose Martinez Poblete created CASSANDRA-12079:
-------------------------------------------------


             Summary: CQLSH to retrieve column names from data file header
                 Key: CASSANDRA-12079
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12079
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
         Environment: Cassandra 2.1.14.1346
            Reporter: Jose Martinez Poblete


Suppose a have a table with 3 columns
Then the data is copied to a delimited file with *HEADER*

{noformat}
cqlsh> create KEYSPACE my_keyspace WITH replication = {'class': 
'SimpleStrategy', 'replication_factor': 1 } AND durable_writes = 'true';
cqlsh> use my_keyspace ;
cqlsh:my_keyspace> CREATE TABLE my_table ( col1 int PRIMARY KEY, col2 text, 
col3 text );
cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 1, '1st row') ;
cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 2, '2nd row') ;
cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 3, '3rd row') ;
cqlsh:my_keyspace> COPY my_keyspace.my_table ( col1, col2 ) TO 'my_table.dat' 
WITH DELIMITER = '|' AND HEADER = true ;
Reading options from the command line: {'header': 'true', 'delimiter': '|'}
Using 3 child processes

Starting copy of my_keyspace.my_table with columns ['col1', 'col2'].
Processed: 3 rows; Rate:      10 rows/s; Avg. rate:       4 rows/s
3 rows exported to 1 files in 0.861 seconds.
{noformat}

This will create a file with these contents

{noformat}
col1|col2
3|3rd row
2|2nd row
1|1st row
{noformat}

Then we create another table with same DDL 

{noformat}
cqlsh:my_keyspace> CREATE TABLE my_table2 ( col1 int PRIMARY KEY, col2 text, 
col3 text );
{noformat}

A restore from the recently created delimited file *with header* data file WILL 
FAIL because no columns were specified so it is expecting all columns to be in 
the delimited file - but we have a header row and the header option was 
specified

{noformat}
cqlsh:my_keyspace> COPY my_table2 FROM 'my_table.dat' WITH DELIMITER = '|' AND 
HEADER = true ;
Reading options from the command line: {'header': 'true', 'delimiter': '|'}
Using 3 child processes

Starting copy of my_keyspace.my_table2 with columns ['col1', 'col2', 'col3'].
Failed to import 3 rows: ParseError - Invalid row length 2 should be 3,  given 
up without retries
Failed to process 3 rows; failed rows written to 
import_my_keyspace_my_table2.err
Processed: 3 rows; Rate:       5 rows/s; Avg. rate:       7 rows/s
3 rows imported from 1 files in 0.442 seconds (0 skipped).
{noformat}

Provided that *HEADER = true*,  It would be very handy if CQLSH looks into the 
*header row* and retrieves the column names so they do not have to be entered 
manually on the copy command - especially where there is a significant number 
of columns

{noformat}
cqlsh:my_keyspace> COPY my_table2 (col1, col2) FROM 'my_table.dat' WITH 
DELIMITER = '|' AND HEADER = true ;
Reading options from the command line: {'header': 'true', 'delimiter': '|'}
Using 3 child processes

Starting copy of my_keyspace.my_table2 with columns ['col1', 'col2'].
Processed: 3 rows; Rate:       3 rows/s; Avg. rate:       4 rows/s
3 rows imported from 1 files in 0.708 seconds (0 skipped).
cqlsh:my_keyspace> select * from my_table2;

 col1 | col2    | col3
------+---------+------
    1 | 1st row | null
    2 | 2nd row | null
    3 | 3rd row | null

(3 rows)
cqlsh:my_keyspace> 
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-12079) CQLSH to retrieve column names from data file header

Reply via email to