---------- Forwarded message ---------- From: "Mark Lewandowski" <mark.e.lewandow...@gmail.com> Date: Jun 8, 2013 8:03 AM Subject: Cassandra (1.2.5) + Pig (0.11.1) Errors with large column families To: <user@cassandra.apache.org> Cc:
> I'm cur.rently trying to get Cassandra (1.2.5) and Pig (0.11.1) to play nice together. I'm running a basic script: > > rows = LOAD 'cassandra://keyspace/colfam' USING CassandraStorage(); > dump rows; > > This fails for my column family which has ~100,000 rows. However, if I modify the script to this: > > rows = LOAD 'cassandra://betable_games/bets' USING CassandraStorage(); > rows = limit rows 7000; > dump rows; > > Then it seems to work. 7000 is about as high as I've been able to get it before it fails. The error I keep getting is: > > 2013-06-07 14:58:49,119 [Thread-4] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 > java.lang.RuntimeException: org.apache.thrift.TException: Message length exceeded: 4480 > at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384) > at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390) > at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313) > at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169) > at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514) > at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214) > Caused by: org.apache.thrift.TException: Message length exceeded: 4480 > at org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393) > at org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363) > at org.apache.cassandra.thrift.Column.read(Column.java:535) > at org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507) > at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408) > at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12905) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) > at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734) > at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718) > at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346) > ... 13 more > > > I've seen a similar problem on this mailing list using Cassandra-1.2.3, however the fixes on that thread of increasing thrift_framed_transport_size_in_mb, thrift_max_message_length_in_mb in cassandra.yaml did not appear to have any effect. Has anyone else seen this issue, and how can I fix it? > > Thanks, > > -Mark