Oksana Danylyshyn created CASSANDRA-8166:
--------------------------------------------

             Summary: Not all data is loaded to Pig using CqlNativeStorage
                 Key: CASSANDRA-8166
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8166
             Project: Cassandra
          Issue Type: Bug
          Components: Hadoop
            Reporter: Oksana Danylyshyn
         Attachments: sorted.zip

Not all the data from Cassandra table is loaded into Pig using CqlNativeStorage 
function.

Steps to reproduce:

cql3 create table statement:

CREATE TABLE time_bucket_step (
  key varchar,
  object_id varchar,
  value varchar,
  PRIMARY KEY (key, object_id)
);

Loading and saving data to Cassandra ("sorted" file is in the attachment):

time_bucket_step = load 'sorted' using PigStorage('\t','-schema');

records = foreach time_bucket_step
  generate
    TOTUPLE(TOTUPLE('key', key),TOTUPLE('object_id', object_id)),
    TOTUPLE(value);

store records into 
'cql://socialdata/time_bucket_step?output_query=UPDATE+socialdata.time_bucket_step+set+value+%3D+%3F'
 using org.apache.cassandra.hadoop.pig.CqlNativeStorage();

Results:

Input(s):
Successfully read 139026 records (11115817 bytes) from: "hdfs://.../sorted"
Output(s):
Successfully stored 139026 records in: 
"cql://socialdata/time_bucket_step?output_query=UPDATE+socialdata.time_bucket_step+set+value+%3D+%3F"


Loading data from Cassandra: (note that not all data are read)

time_bucket_step_cass = load 'cql://socialdata/time_bucket_step' using 
org.apache.cassandra.hadoop.pig.CqlNativeStorage();
store time_bucket_step_cass into 'time_bucket_step_cass' using 
PigStorage('\t','-schema');

Results:

Input(s):
Successfully read 80727 records (20068 bytes) from: 
"cql://socialdata/time_bucket_step"
Output(s):
Successfully stored 80727 records (2098178 bytes) in: 
"hdfs://..../time_bucket_step_cass"

Actual: only 80727 of 139026 records were loaded
Expected: All data should be loaded



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to