[jira] [Created] (AVRO-1093) DataFileWriter, appendEncoded causes AvroRuntimeException when read back

Catalin Alexandru Zamfir (JIRA) Wed, 16 May 2012 07:37:28 -0700

Catalin Alexandru Zamfir created AVRO-1093:
----------------------------------------------


             Summary: DataFileWriter, appendEncoded causes AvroRuntimeException 
when read back
                 Key: AVRO-1093
                 URL: https://issues.apache.org/jira/browse/AVRO-1093
             Project: Avro
          Issue Type: Bug
    Affects Versions: 1.6.3
            Reporter: Catalin Alexandru Zamfir


We're doing this:
{code}
// Check
                if (!(objRecordsBuffer
                .containsKey (objShardPath))) {
                        // Set
                        objRecordsBuffer.put (objShardPath,
                        new ByteBufferOutputStream ());
                }

                // Set
                Encoder objEncoder =  EncoderFactory.get ()
                .binaryEncoder (objRecordsBuffer
                .get (objShardPath), null);

                // Write
                objGenericDatumWriter.write (objRecordConstructor.build (), 
objEncoder);
                objEncoder.flush ();

// For
                                for (ByteBuffer objRecord : objRecordsBuffer
                                .get (objKey).getBufferList ()) {
                                        // Append
                                        objRecordWriter.appendEncoded 
(objRecord);
                                }

                                // Erase
                                objRecordWriter.flush ();
                                objRecordWriter.close ();
{code}

It writes the data to HDFS. Reading it back outputs the follosing exception:
{code}
Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException: Block 
read partially, the data may be corrupt
        at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
        at 
net.RnD.FileUtils.TimestampedReader.hasNext(TimestampedReader.java:113)
        at net.RnD.Hadoop.App.read1BAvros(App.java:131)
        at net.RnD.Hadoop.App.executeCode(App.java:534)
        at net.RnD.Hadoop.App.main(App.java:453)
        ... 5 more
Caused by: java.io.IOException: Block read partially, the data may be corrupt
        at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:194)
        ... 9 more
{code}

The objRecordWriter is an instance of DataFileWriter.create or 
DataFileWriter.appendto (SeekableInput). In relation to AVRO-1090 ticket.

Instead of having big "hashmaps" in memory, we've decided to serialize the data 
in "byte buffers" in memory. Because it's faster. Using "appendEncoded" 
although seems to write something to HDFS, reading the data back, exposes this 
error.

Help would be appreciated. I've looked @ appendEncoded in DataFileWriter but 
could not figure out if it's our job to add a sync marker, or does 
appendEncoded does that for us.

Must the "ByteBuffer" we give, be the length of one exact record?
Examples and documentation on this method is welcomed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (AVRO-1093) DataFileWriter, appendEncoded causes AvroRuntimeException when read back

Reply via email to