As Josh said, using a series of Ranges would be more efficient. Depending on the quantity, there is a known bug in older releases when you have a LOT of ranges, but barring that it should work for you. Instead of doing a range containing the entire table, you can do a bunch of single row ranges which correspond to the query terms. The mappers should only ever get data which was expressed in the set of ranges supplied.
On Wed, Jan 2, 2013 at 6:30 PM, Seastrom, Jessica K <[email protected]>wrote: > Using AccumuloInputFormat.setRanges(conf, someRange), should I expect that > the Key,Values as input to the Map method will be restricted to those keys > in the set contained in someRange? > > My current implementation filters K,V pairs using the DistributedCache to > hold the query terms > (if(myDistributedCacheQueryTermsHashSet.contains(key.getRow())…) but I > wonder if AccumuloInputFormat.setRanges is an alternate implementation. It > didn't seem to filter as above, but perhaps I'm just not implementing it > correctly. > > Thank you, > Jessica > >
