[
https://issues.apache.org/jira/browse/AVRO-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503556#comment-15503556
]
Myles Baker edited comment on AVRO-1419 at 9/19/16 4:50 PM:
------------------------------------------------------------
I've seen this for Spark jobs using yarn as the resource manager. The job
aborts due to stage failure caused by org.apache.avro.AvroRuntimeException:
java.io.IOException: Invalid sync.
This issue is caused by poorly formatted Avro files (i.e. user error). [Check
out this mailing list archive |
http://mail-archives.apache.org/mod_mbox/avro-user/201105.mbox/%3cca03b5f3.5891%[email protected]%3E]
that describes an example where Avro files were naively concatenated.
was (Author: mylesbaker):
I've seen this for Spark jobs using yarn as the resource manager. The job
aborts due to stage failure caused by org.apache.avro.AvroRuntimeException:
java.io.IOException: Invalid sync!
I'll comment/update this after doing some debugging. I don't know that my issue
is caused by the same condition as this bug.
> java.io.IOException: Invalid sync! throw after random number of sync() calls.
> -----------------------------------------------------------------------------
>
> Key: AVRO-1419
> URL: https://issues.apache.org/jira/browse/AVRO-1419
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.7.5
> Reporter: Deepak Kumar V
>
> I have a 340 MB avro data file that contains records sorted and identified by
> unique id (duplicate records exists). At the beginning of every unique record
> a synchronization point is created with DataFileWriter.sync(). (I cannot or
> do not want to save the sync points and i do not want to use
> SortedKeyValueFile as output format for M/R job)
> There are at-least 25k synchronization points in a 340 MB file.
> Ex:
> Marker1_RecordA1_RecordA2_RecordA3_Marker2_RecordB1_RecordB2
> As records are sorted and marked, for efficient retrieval, binary search is
> performed. Most of the times the search is successful, at times the code
> throws the following exception
> ------
> org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync! at
> org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210
> ------
> I note down the position that was used to invoke fileReader.sync(mid); and
> catch AvroRuntimeException, close and open the file and sync(mid) i do not
> see exception.
> Why should Avro throw exception before and not later ?
> 1.7.5v of library is throwing this error. Raising a major defect, adjust the
> priority at your convenience.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)