InputSplit.getLength() and RecordReader.getProgress() is important for the MR framework to be able to show progress etc. It would be good to return raw data sizes in getLength() computed from region's total size of store files, and progress being calculated from scanner's amount of raw data seen.
Enis On Fri, Jan 24, 2014 at 10:57 AM, Nick Dimiduk <[email protected]> wrote: > Ideally this would return the number of rows in the split's rowkey range. > How do we get that without scanning/counting or sampling? This kind of > metadata isn't available. > > I thing this metadata could be helpful in a number of uses. Do you have any > ideas where or how me might track that kind of thing? > > > On Fri, Jan 24, 2014 at 8:17 AM, Jean-Marc Spaggiari < > [email protected]> wrote: > > > Interesting question here: > > https://issues.apache.org/jira/browse/HBASE-10413 > > > > > > public long getLength() { // Not clear how to obtain this... seems to be > > used only for sorting splits return 0; } > > > > I did not looked at it, but any clue why it's like that? > > > > Thanks, > > > > JM > > >
