Catalin Alexandru Zamfir created AVRO-1093:
----------------------------------------------
Summary: DataFileWriter, appendEncoded causes AvroRuntimeException
when read back
Key: AVRO-1093
URL: https://issues.apache.org/jira/browse/AVRO-1093
Project: Avro
Issue Type: Bug
Affects Versions: 1.6.3
Reporter: Catalin Alexandru Zamfir
We're doing this:
{code}
// Check
if (!(objRecordsBuffer
.containsKey (objShardPath))) {
// Set
objRecordsBuffer.put (objShardPath,
new ByteBufferOutputStream ());
}
// Set
Encoder objEncoder = EncoderFactory.get ()
.binaryEncoder (objRecordsBuffer
.get (objShardPath), null);
// Write
objGenericDatumWriter.write (objRecordConstructor.build (),
objEncoder);
objEncoder.flush ();
// For
for (ByteBuffer objRecord : objRecordsBuffer
.get (objKey).getBufferList ()) {
// Append
objRecordWriter.appendEncoded
(objRecord);
}
// Erase
objRecordWriter.flush ();
objRecordWriter.close ();
{code}
It writes the data to HDFS. Reading it back outputs the follosing exception:
{code}
Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException: Block
read partially, the data may be corrupt
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
at
net.RnD.FileUtils.TimestampedReader.hasNext(TimestampedReader.java:113)
at net.RnD.Hadoop.App.read1BAvros(App.java:131)
at net.RnD.Hadoop.App.executeCode(App.java:534)
at net.RnD.Hadoop.App.main(App.java:453)
... 5 more
Caused by: java.io.IOException: Block read partially, the data may be corrupt
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:194)
... 9 more
{code}
The objRecordWriter is an instance of DataFileWriter.create or
DataFileWriter.appendto (SeekableInput). In relation to AVRO-1090 ticket.
Instead of having big "hashmaps" in memory, we've decided to serialize the data
in "byte buffers" in memory. Because it's faster. Using "appendEncoded"
although seems to write something to HDFS, reading the data back, exposes this
error.
Help would be appreciated. I've looked @ appendEncoded in DataFileWriter but
could not figure out if it's our job to add a sync marker, or does
appendEncoded does that for us.
Must the "ByteBuffer" we give, be the length of one exact record?
Examples and documentation on this method is welcomed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira