Hi,

I have a RecordReader implementation which reads the records asynchronously and 
caches them in memory(In a BlockingQueue).
When TrackingRecordReader calls for next Record, the internal implementation of 
RecordReader reads from the queue and supplies the record to MapTask.
The TrackingRecordReader increments the BYTES_READ  counter by calculating:
bytesInCurr - bytesInPrev
where bytesIncurr is FSStatistics byte read after the call to next and 
bytesInPrev is before call to next.
As the records are already read before making a call to next most of the time 
bytesInCurr - bytesInPrev results to zero or some other value if the 
Asynchronous Thread is running in background.
Earlier the BYTES_READ counter was handled by getPos() method which my 
RecordReader use to take care properly.

Would like to get opinion if the current behavior of calculating BYTES_READ in 
TrackingRecordReader is correct as it compels the user to read the records in 
synchronous fashion.

Please let me know if there is any workaround for getting the correct 
statistics from the MR job.

Cheers,
Subroto Sanyal

Reply via email to