[
https://issues.apache.org/jira/browse/GORA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491024#comment-13491024
]
Lewis John McGibbney commented on GORA-170:
-------------------------------------------
WHen I attempt to Generate a fetch list with Nutch 2.x I get the following
{code}
2012-11-05 22:51:03,951 DEBUG connection.HThriftClient - keyspace reseting from
null to webpage
2012-11-05 22:51:04,066 DEBUG connection.HThriftClient - Transport open status
true for client CassandraClient<localhost:9160-8>
2012-11-05 22:51:04,066 DEBUG connection.ConcurrentHClientPool - Status of
releaseClient CassandraClient<localhost:9160-8> to queue: true
2012-11-05 22:51:04,087 WARN mapred.FileOutputCommitter - Output path is null
in cleanup
2012-11-05 22:51:04,089 WARN mapred.LocalJobRunner - job_local_0001
java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:480)
at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:336)
at
me.prettyprint.cassandra.serializers.IntegerSerializer.fromByteBuffer(IntegerSerializer.java:35)
at
me.prettyprint.cassandra.serializers.FloatSerializer.fromByteBuffer(FloatSerializer.java:25)
at
me.prettyprint.cassandra.serializers.FloatSerializer.fromByteBuffer(FloatSerializer.java:10)
at
org.apache.gora.cassandra.query.CassandraColumn.fromByteBuffer(CassandraColumn.java:74)
at
org.apache.gora.cassandra.query.CassandraSubColumn.getValue(CassandraSubColumn.java:86)
at
org.apache.gora.cassandra.query.CassandraResult.updatePersistent(CassandraResult.java:90)
at
org.apache.gora.cassandra.query.CassandraResult.nextInner(CassandraResult.java:56)
at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:112)
at
org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:111)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
2012-11-05 22:51:04,253 ERROR crawl.GeneratorJob - GeneratorJob:
java.lang.RuntimeException: job failed: name=generate: 1352155857-1625665918,
jobid=job_local_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:191)
at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:213)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:241)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:249)
{code}
This is without the patch attached.
> Getting a BufferUnderflowException in class CassandraColumn, method
> fromByteBuffer()
> ------------------------------------------------------------------------------------
>
> Key: GORA-170
> URL: https://issues.apache.org/jira/browse/GORA-170
> Project: Apache Gora
> Issue Type: Bug
> Components: storage-cassandra
> Affects Versions: 0.2.1
> Environment: Not sure environment matters for this one but Ubuntu
> Reporter: Chris Gerken
> Priority: Blocker
> Fix For: 0.3
>
>
> When using CassandraStore and GoraMapper to retrieve data previously stored
> in Cassandra, a BufferUnderflowException is being thrown in method
> fromByteBuffer() in class CassandraColumn. This results in a complete
> failure of the hadoop job trying to use the Cassandra data.
> The problem seems to be caused by an invalid assumption in the (de)
> Serializer logic. Serializers assume that the bytes in a ByteBuffer to be
> deserialized start at offset 0 (zero) in the ByteBuffer's internal buffer.
> In fact, there are times when a ByteBuffer passed back from the
> Hector/Thrift API will have its data start at a non-zero offset in its
> buffer. When serializers are given these non-zero offset ByteBuffers an
> exception, usually BufferUnderflowException, is thrown.
> The suggested fix is to use the TbaseHelper class from Cassandra/Thrift:
> import org.apache.thrift.TBaseHelper;
> protected Object fromByteBuffer(Schema schema, ByteBuffer byteBuffer) {
> Object value = null;
> Serializer serializer = GoraSerializerTypeInferer.getSerializer(schema);
> if (serializer == null) {
> LOG.info("Schema is not supported: " + schema.toString());
> } else {
> ByteBuffer corrected = TBaseHelper.rightSize(byteBuffer);
> value = serializer.fromByteBuffer(corrected);
> }
> return value;
> }
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira