[
https://issues.apache.org/jira/browse/CASSANDRA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270942#comment-14270942
]
Artem Aliev edited comment on CASSANDRA-8577 at 1/9/15 12:08 PM:
-----------------------------------------------------------------
to reproduce the bug with unit tests:
1 replace ./build/lib/jars/cassandra-driver-core-2.0.5.jar with
cassandra-driver-core-2.1.3.jar
2 run pig unit tests
ant pig-test -Dtest.name=CqlTableDataTypeTest
{code}
….
[junit] org.apache.cassandra.serializers.MarshalException: Unexpected
extraneous bytes after list value
[junit] at
org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:104)
[junit] at
org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:27)
[junit] at
org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
[junit] at
org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
[junit] at
org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
[junit] at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
[junit] at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
[junit] at
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
[junit] at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
[junit] at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
[junit] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
[junit] at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
….
{code}
Cassandra 2.1 is shipped with java driver 2.0, that used V2 native protocol.
The java driver 2.1 is available and it use V3 native protocol.
The collection serialisation is changed in V3. Current implementation of pig
reader has harcoded version 1 for deserialisation, as result of incomplete fix
of CASSANDRA-7287.
The version 1 should be used in cql-over-thrift deprecated API only.
CqlNativeStorage use java driver protocol. So the patch passes the negotiated
by java driver serialisation protocol to deserialiser in case CqlNativeStorage
is used. I also add optional ‘cassandra.input.native.protocol.version’
parameter to force the protocol version, just in case.
was (Author: artem.aliev):
to reproduce the bug with unit tests:
1 replace ./build/lib/jars/cassandra-driver-core-2.0.5.jar with
cassandra-driver-core-2.0.5.jar
2 run pig unit tests
ant pig-test -Dtest.name=CqlTableDataTypeTest
{code}
….
[junit] org.apache.cassandra.serializers.MarshalException: Unexpected
extraneous bytes after list value
[junit] at
org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:104)
[junit] at
org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:27)
[junit] at
org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
[junit] at
org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
[junit] at
org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
[junit] at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
[junit] at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
[junit] at
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
[junit] at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
[junit] at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
[junit] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
[junit] at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
….
{code}
Cassandra 2.1 is shipped with java driver 2.0, that used V2 native protocol.
The java driver 2.1 is available and it use V3 native protocol.
The collection serialisation is changed in V3. Current implementation of pig
reader has harcoded version 1 for deserialisation, as result of incomplete fix
of CASSANDRA-7287.
The version 1 should be used in cql-over-thrift deprecated API only.
CqlNativeStorage use java driver protocol. So the patch passes the negotiated
by java driver serialisation protocol to deserialiser in case CqlNativeStorage
is used. I also add optional ‘cassandra.input.native.protocol.version’
parameter to force the protocol version, just in case.
> Values of set types not loading correctly into Pig
> --------------------------------------------------
>
> Key: CASSANDRA-8577
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8577
> Project: Cassandra
> Issue Type: Bug
> Reporter: Oksana Danylyshyn
> Assignee: Brandon Williams
> Fix For: 2.1.3
>
> Attachments: cassandra-2.1-8577.txt
>
>
> Values of set types are not loading correctly from Cassandra (cql3 table,
> Native protocol v3) into Pig using CqlNativeStorage.
> When using Cassandra version 2.1.0 only empty values are loaded, and for
> newer versions (2.1.1 and 2.1.2) the following error is received:
> org.apache.cassandra.serializers.MarshalException: Unexpected extraneous
> bytes after set value
> at
> org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
> Steps to reproduce:
> {code}cqlsh:socialdata> CREATE TABLE test (
> key varchar PRIMARY KEY,
> tags set<varchar>
> );
> cqlsh:socialdata> insert into test (key, tags) values ('key', {'Running',
> 'onestep4red', 'running'});
> cqlsh:socialdata> select * from test;
> key | tags
> -----+---------------------------------------
> key | {'Running', 'onestep4red', 'running'}
> (1 rows){code}
> With version 2.1.0:
> {code}grunt> data = load 'cql://socialdata/test' using
> org.apache.cassandra.hadoop.pig.CqlNativeStorage();
> grunt> dump data;
> (key,()){code}
> With version 2.1.2:
> {code}grunt> data = load 'cql://socialdata/test' using
> org.apache.cassandra.hadoop.pig.CqlNativeStorage();
> grunt> dump data;
> org.apache.cassandra.serializers.MarshalException: Unexpected extraneous
> bytes after set value
> at
> org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
> at
> org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:27)
> at
> org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
> at
> org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
> at
> org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212){code}
> Expected result:
> {code}(key,(Running,onestep4red,running)){code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)