[GitHub] ctubbsii commented on issue #753: Improve new MapReduce API

GitBox Thu, 08 Nov 2018 17:58:02 -0800

ctubbsii commented on issue #753: Improve new MapReduce API
URL: https://github.com/apache/accumulo/issues/753#issuecomment-437223256
 
 
   How about `.configure(). ... .store(job)` instead of serialize? How would we 
incorporate multiple tables into this? I'm kinda thinking maybe have a `.table` 
hanging off of each method after the first `.table`, and implement it so you 
can configure one table, then the next, then the next, etc. Methods chained 
after one `.table` call, and before the next `.table` affect only the one that 
precedes it. So,
   
   ```java
   AccumuloInputFormat.configure().clientInfo(clientInfo)
     .table("table1").scanAuths(Authorizations.EMPTY).addIterator(is1)
     
.table("table2").ranges(ranges).scanAuths(auths).addIterator(is2).addIterator(is3)
     .table("table3")
     .store(job);
   ```
   
   In the above, table1 is fully scanned with no auths and one iterator, while 
table2 is only scanned over specific ranges and with custom auths and two 
iterators, and the third table is fully scanned with all the default behavior.
   
   Internally, we just edit the most recent internal data structure for a 
table, and add it to a list of all the tables when we encounter the next 
`.table` or a terminating `.store`. For keeping the table configs separate, we 
internally track a 1-up counter for each table.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] ctubbsii commented on issue #753: Improve new MapReduce API

Reply via email to