Github user ctubbsii commented on the pull request:
https://github.com/apache/accumulo/pull/25#issuecomment-91037784
In JIRA, I [mentioned][1] "sometimes it's better to query a larger range
and let an iterator filter out non-matching results".
I think the createRanges method @keith-turner describes could work if the
function is executed in the RecordReader (it also simplifies this issue
significantly, because you wouldn't need to create a new InputSplit type, but
simply add an option to the AccumuloInputFormat). There's still some risk of
memory exhaustion with a large number of ranges within a tablet (especially if
the ranges were an exhaustive set of row-records to retrieve).
However, I still think that for many things, it's probably better to simply
use an iterator with some filter criteria. It could be a SkippingIterator that
seeks to ranges which are pre-configured on that iterator, or it could be a
Filter which has some filter criteria.
[1]:
https://issues.apache.org/jira/browse/ACCUMULO-3602?focusedCommentId=14327767&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14327767
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---