[
https://issues.apache.org/jira/browse/AVRO-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doug Cutting resolved AVRO-1093.
--------------------------------
Resolution: Invalid
> DataFileWriter, appendEncoded causes AvroRuntimeException when read back
> ------------------------------------------------------------------------
>
> Key: AVRO-1093
> URL: https://issues.apache.org/jira/browse/AVRO-1093
> Project: Avro
> Issue Type: Bug
> Affects Versions: 1.6.3, 1.7.0
> Reporter: Catalin Alexandru Zamfir
>
> We're doing this:
> {code}
> // Check
> if (!(objRecordsBuffer
> .containsKey (objShardPath))) {
> // Set
> objRecordsBuffer.put (objShardPath,
> new ByteBufferOutputStream ());
> }
> // Set
> Encoder objEncoder = EncoderFactory.get ()
> .binaryEncoder (objRecordsBuffer
> .get (objShardPath), null);
> // Write
> objGenericDatumWriter.write (objRecordConstructor.build (),
> objEncoder);
> objEncoder.flush ();
> // For
> for (ByteBuffer objRecord : objRecordsBuffer
> .get (objKey).getBufferList ()) {
> // Append
> objRecordWriter.appendEncoded
> (objRecord);
> }
> // Erase
> objRecordWriter.flush ();
> objRecordWriter.close ();
> {code}
> It writes the data to HDFS. Reading it back outputs the follosing exception:
> {code}
> Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException: Block
> read partially, the data may be corrupt
> at
> org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
> at
> net.RnD.FileUtils.TimestampedReader.hasNext(TimestampedReader.java:113)
> at net.RnD.Hadoop.App.read1BAvros(App.java:131)
> at net.RnD.Hadoop.App.executeCode(App.java:534)
> at net.RnD.Hadoop.App.main(App.java:453)
> ... 5 more
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
> at
> org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:194)
> ... 9 more
> {code}
> The objRecordWriter is an instance of DataFileWriter.create or
> DataFileWriter.appendto (SeekableInput). In relation to AVRO-1090 ticket.
> Instead of having big "hashmaps" in memory, we've decided to serialize the
> data in "byte buffers" in memory. Because it's faster. Using "appendEncoded"
> although seems to write something to HDFS, reading the data back, exposes
> this error.
> Help would be appreciated. I've looked @ appendEncoded in DataFileWriter but
> could not figure out if it's our job to add a sync marker, or does
> appendEncoded does that for us.
> Must the "ByteBuffer" we give, be the length of one exact record?
> Examples and documentation on this method is welcomed.
> Files are getting created because:
> {code}
> -rw-r--r-- 3 root supergroup 124901360 2012-05-17 10:09
> /Streams/Timestamped/Threads/2012/05/17/10/09/Shard.avro
> -rw-r--r-- 3 root supergroup 124845625 2012-05-17 10:10
> /Streams/Timestamped/Threads/2012/05/17/10/10/Shard.avro
> -rw-r--r-- 3 root supergroup 62378307 2012-05-17 10:11
> /Streams/Timestamped/Threads/2012/05/17/10/11/Shard.avro
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira