[ https://issues.apache.org/jira/browse/GORA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491024#comment-13491024 ]
Lewis John McGibbney commented on GORA-170: ------------------------------------------- WHen I attempt to Generate a fetch list with Nutch 2.x I get the following {code} 2012-11-05 22:51:03,951 DEBUG connection.HThriftClient - keyspace reseting from null to webpage 2012-11-05 22:51:04,066 DEBUG connection.HThriftClient - Transport open status true for client CassandraClient<localhost:9160-8> 2012-11-05 22:51:04,066 DEBUG connection.ConcurrentHClientPool - Status of releaseClient CassandraClient<localhost:9160-8> to queue: true 2012-11-05 22:51:04,087 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2012-11-05 22:51:04,089 WARN mapred.LocalJobRunner - job_local_0001 java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:480) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:336) at me.prettyprint.cassandra.serializers.IntegerSerializer.fromByteBuffer(IntegerSerializer.java:35) at me.prettyprint.cassandra.serializers.FloatSerializer.fromByteBuffer(FloatSerializer.java:25) at me.prettyprint.cassandra.serializers.FloatSerializer.fromByteBuffer(FloatSerializer.java:10) at org.apache.gora.cassandra.query.CassandraColumn.fromByteBuffer(CassandraColumn.java:74) at org.apache.gora.cassandra.query.CassandraSubColumn.getValue(CassandraSubColumn.java:86) at org.apache.gora.cassandra.query.CassandraResult.updatePersistent(CassandraResult.java:90) at org.apache.gora.cassandra.query.CassandraResult.nextInner(CassandraResult.java:56) at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:112) at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:111) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) 2012-11-05 22:51:04,253 ERROR crawl.GeneratorJob - GeneratorJob: java.lang.RuntimeException: job failed: name=generate: 1352155857-1625665918, jobid=job_local_0001 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54) at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:191) at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:213) at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:241) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:249) {code} This is without the patch attached. > Getting a BufferUnderflowException in class CassandraColumn, method > fromByteBuffer() > ------------------------------------------------------------------------------------ > > Key: GORA-170 > URL: https://issues.apache.org/jira/browse/GORA-170 > Project: Apache Gora > Issue Type: Bug > Components: storage-cassandra > Affects Versions: 0.2.1 > Environment: Not sure environment matters for this one but Ubuntu > Reporter: Chris Gerken > Priority: Blocker > Fix For: 0.3 > > > When using CassandraStore and GoraMapper to retrieve data previously stored > in Cassandra, a BufferUnderflowException is being thrown in method > fromByteBuffer() in class CassandraColumn. This results in a complete > failure of the hadoop job trying to use the Cassandra data. > The problem seems to be caused by an invalid assumption in the (de) > Serializer logic. Serializers assume that the bytes in a ByteBuffer to be > deserialized start at offset 0 (zero) in the ByteBuffer's internal buffer. > In fact, there are times when a ByteBuffer passed back from the > Hector/Thrift API will have its data start at a non-zero offset in its > buffer. When serializers are given these non-zero offset ByteBuffers an > exception, usually BufferUnderflowException, is thrown. > The suggested fix is to use the TbaseHelper class from Cassandra/Thrift: > import org.apache.thrift.TBaseHelper; > protected Object fromByteBuffer(Schema schema, ByteBuffer byteBuffer) { > Object value = null; > Serializer serializer = GoraSerializerTypeInferer.getSerializer(schema); > if (serializer == null) { > LOG.info("Schema is not supported: " + schema.toString()); > } else { > ByteBuffer corrected = TBaseHelper.rightSize(byteBuffer); > value = serializer.fromByteBuffer(corrected); > } > return value; > } > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira