[ 
https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284647#comment-14284647
 ] 

Philip Thompson commented on CASSANDRA-8358:
--------------------------------------------

It seems that Pig is unhappy with all of the collection types. I see exceptions 
like this:
{code}
18:31:00.105 [Thread-4] WARN  o.a.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: Unexpected data type java.util.ArrayList found in 
stream. Note only standard Pig type is supported when you output from 
UDF/LoadFunc
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:596) 
~[pig-0.12.1.jar:na]
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462) 
~[pig-0.12.1.jar:na]
        at 
org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) 
~[pig-0.12.1.jar:na]
        at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650) 
~[pig-0.12.1.jar:na]
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:470) 
~[pig-0.12.1.jar:na]
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462) 
~[pig-0.12.1.jar:na]
        at 
org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73) 
~[pig-0.12.1.jar:na]
        at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:88) 
~[pig-0.12.1.jar:na]
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
 ~[pig-0.12.1.jar:na]
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
 ~[pig-0.12.1.jar:na]
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
 ~[hadoop-core-1.0.3.jar:na]
        at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
 ~[hadoop-core-1.0.3.jar:na]
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
 ~[pig-0.12.1.jar:na]
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:263)
 ~[pig-0.12.1.jar:na]
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
 ~[pig-0.12.1.jar:na]
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) 
~[hadoop-core-1.0.3.jar:na]
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
~[hadoop-core-1.0.3.jar:na]
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) 
~[hadoop-core-1.0.3.jar:na]
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) 
~[hadoop-core-1.0.3.jar:na]
{code}


> Bundled tools shouldn't be using Thrift API
> -------------------------------------------
>
>                 Key: CASSANDRA-8358
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>            Assignee: Philip Thompson
>             Fix For: 3.0
>
>
> In 2.1, we switched cqlsh to the python-driver.
> In 3.0, we got rid of cassandra-cli.
> Yet there is still code that's using legacy Thrift API. We want to convert it 
> all to use the java-driver instead.
> 1. BulkLoader uses Thrift to query the schema tables. It should be using 
> java-driver metadata APIs directly instead.
> 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
> 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
> 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
> 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
> Some of the things listed above use Thrift to get the list of partition key 
> columns or clustering columns. Those should be converted to use the Metadata 
> API of the java-driver.
> Somewhat related to that, we also have badly ported code from Thrift in 
> o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches 
> columns from schema tables instead of properly using the driver's Metadata 
> API.
> We need all of it fixed. One exception, for now, is 
> o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its 
> describe_splits_ex() call that cannot be currently replaced by any 
> java-driver call (?).
> Once this is done, we can stop starting Thrift RPC port by default in 
> cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to