Josh, Thanks. I have a working solution by enhancing AccumuloRangeGenerator. Have started the process internally to contribute back. I will definitely appreciate and need your help in getting the code into the community.
Regards, Mike Fagan On 1/9/17, 1:06 PM, "Josh Elser" <[email protected]> wrote: If you have that info, yeah I think you could. The lifecycle of those queries is a bit strange (and, IIRC, different depending on the execution engine Hive uses). Experimentation is definitely the way forward :). Let me know if you need any help -- I'm happy to at least try to help. If you come up with something generic enough, it'd be great to contribute it back to Hive (which I can also help with). Fagan, Michael wrote: > Josh, > > Thanks, it looks like If I can override the getRanges() from the AccumuloPredicateHandler I might be able to build correct ranges based on matching index rows. > Does this sound feasible? > > Regards, > Mike Fagan > > On 1/9/17, 12:38 PM, "Josh Elser"<[email protected]> wrote: > > Hi Mike, > > As far as I understand it, the Hive storage handler APIs (which is how > the Accumulo integration is implemented) doesn't expose any ability to > do use index tables to answer some query. > > This means that the only thing you can do to make queries faster, would > be to create a number of tables, pivoted on the columns you care about, > putting the important columns in the rowId. Then, you would have to know > which table to use at the application layer. > > Admittedly, this is pretty lacking. I'd have to go look at the Hive > community to see if this is something that's been built there. > > - Josh > > Fagan, Michael wrote: > > Hi, > > > > I am looking to utilize an index table to avoid full table scans and speed up hive queries against an external accumulo table. > > > > Has anyone done this yet? Can someone point me in the right direction? > > > > Regards, > > Mike Fagan > > > > >
