I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. Anyway, we are interested in fixing that, estimating length from files is good idea.

Lukas

InputSplit.getLength() and RecordReader.getProgress() is important for the
MR framework to be able to show progress etc. It would be good to return
raw data sizes in getLength() computed from region's total size of store
files, and progress being calculated from scanner's amount of raw data seen.

Enis


Reply via email to