Deepak Kumar V created AVRO-1419:
------------------------------------

             Summary: java.io.IOException: Invalid sync! throw after random 
number of sync() calls.
                 Key: AVRO-1419
                 URL: https://issues.apache.org/jira/browse/AVRO-1419
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.7.5
            Reporter: Deepak Kumar V


I have a 340 MB avro data file that contains records sorted and identified by 
unique id (duplicate records exists). At the beginning of every unique record a 
synchronization point is created with DataFileWriter.sync(). (I cannot or do 
not want to save the sync points and i do not want to use SortedKeyValueFile as 
output format for M/R job)  

There are at-least 25k synchronization points in a 340 MB file.

Ex:
Marker1_RecordA1_RecordA2_RecordA3_Marker2_RecordB1_RecordB2


As records are sorted and marked, for efficient retrieval, binary search is 
performed. Most of the times the search is successful, at times the code throws 
the following exception
------
org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync! at 
org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210 
------
I note down the position that was used to invoke fileReader.sync(mid); and 
catch AvroRuntimeException, close and open the file and sync(mid) i do not see 
exception. 

Why should Avro throw exception before and not later ?
1.7.5v of library is throwing this error. Raising a major defect, adjust the 
priority at your convenience. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to