Github user keith-turner commented on the pull request:

    https://github.com/apache/accumulo/pull/25#issuecomment-91033820
  
    I was discussing the big picture behind this PR w/ @ctubbsii .   It seems 
like this change could encourage users to pass many ranges as configuration for 
the map reduce job.   This could cause memory exhaustion for the job tracker.   
    
    We discussed passing a function which generates a set of ranges, instead of 
passing lots of ranges.  The implementation would still use a batch scanner (or 
scanner with a special iterator but its harder to pass code to tserver).   Each 
input split could call a function like the following which deterministically 
creates a set of ranges.   Then those ranges could be used for the batch 
scanner. 
    
    ```java
    interface RangeGenerator {
      /**
       * @param tabletRange  The data range for the tablet over which the input 
split is executing
       * @param config a mysterious class that allows user to pass parameters 
to the function
       */
      List<Range> createRanges(Range tabletRange, Myst config);
    }
    ```
    When configuring the AccumuloInputFormat to use the batch scanner, a class 
name that implements this function would be provided.   The ranges set on the 
job would be large ranges for portions of table to process.  An input split 
would be created for each tablet that falls within those large ranges, and for 
each input split the function would be called to possibly create many more 
ranges.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to