Re: Implementing Index Table for Accumulo Hive Queries

Josh Elser Fri, 13 Jan 2017 08:00:18 -0800

Awesome! Sounds great. When your have your internal 'ducks in a row',feel free to ping me via email or on JIRA directly :)


Fagan, Michael wrote:

Josh,


Thanks. I have a working solution by enhancing AccumuloRangeGenerator. Have 
started the process internally to contribute back.
I will definitely appreciate and need your help in getting the code into the 
community.

Regards,
Mike Fagan



On 1/9/17, 1:06 PM, "Josh Elser"<[email protected]>  wrote:

     If you have that info, yeah I think you could.

     The lifecycle of those queries is a bit strange (and, IIRC, different
     depending on the execution engine Hive uses).

     Experimentation is definitely the way forward :). Let me know if you
     need any help -- I'm happy to at least try to help. If you come up with
     something generic enough, it'd be great to contribute it back to Hive
     (which I can also help with).

     Fagan, Michael wrote:
     >  Josh,
     >
     >  Thanks, it looks like If I can override the getRanges() from the 
AccumuloPredicateHandler I might be able to build correct ranges based on matching 
index rows.
     >  Does this sound feasible?
     >
     >  Regards,
     >  Mike Fagan
     >
     >  On 1/9/17, 12:38 PM, "Josh Elser"<[email protected]>   wrote:
     >
     >       Hi Mike,
     >
     >       As far as I understand it, the Hive storage handler APIs (which is 
how
     >       the Accumulo integration is implemented) doesn't expose any 
ability to
     >       do use index tables to answer some query.
     >
     >       This means that the only thing you can do to make queries faster, 
would
     >       be to create a number of tables, pivoted on the columns you care 
about,
     >       putting the important columns in the rowId. Then, you would have 
to know
     >       which table to use at the application layer.
     >
     >       Admittedly, this is pretty lacking. I'd have to go look at the Hive
     >       community to see if this is something that's been built there.
     >
     >       - Josh
     >
     >       Fagan, Michael wrote:
     >       >   Hi,
     >       >
     >       >   I am looking to utilize an index table to avoid full table 
scans and speed up hive queries against an external accumulo table.
     >       >
     >       >   Has anyone done this yet? Can someone point me in the right 
direction?
     >       >
     >       >   Regards,
     >       >   Mike Fagan
     >       >
     >
     >
     >

Re: Implementing Index Table for Accumulo Hive Queries

Reply via email to