Adam Szita created PIG-5373:

             Summary: InterRecordReader might skip records if certain sync 
markers are used
                 Key: PIG-5373
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.17.0
            Reporter: Adam Szita
            Assignee: Adam Szita

Due to bug in InterRecordReader#skipUntilMarkerOrSplitEndOrEOF(), it can happen 
that sync markers are not identified while reading the interim binary file used 
to hold data between jobs.

In such files sync markers are placed upon writing, which later help during 
reading the data. These are random generated and it seems like that in some 
rare combinations of markers and data preceding it, they cannot be not found. 
This can result in reading through all the bytes (looking for the marker) and 
reaching split end or EOF, and extracting no records at all.

This symptom is also observable from JobHistory stats, where if a job is 
affected by this issue, will have tasks that have HDFS_BYTES_READ or 
FILE_BYTES_READ about equal to the number bytes of the split, but at the same 
time having MAP_INPUT_RECORDS=0

This message was sent by Atlassian JIRA

Reply via email to