Re: Implementing Index Table for Accumulo Hive Queries

Fagan, Michael Thu, 12 Jan 2017 11:23:29 -0800

Josh,

Thanks. I have a working solution by enhancing AccumuloRangeGenerator. Have 
started the process internally to contribute back. 
I will definitely appreciate and need your help in getting the code into the 
community.


Regards,
Mike Fagan



On 1/9/17, 1:06 PM, "Josh Elser" <[email protected]> wrote:

    If you have that info, yeah I think you could.
    
    The lifecycle of those queries is a bit strange (and, IIRC, different 
    depending on the execution engine Hive uses).
    
    Experimentation is definitely the way forward :). Let me know if you 
    need any help -- I'm happy to at least try to help. If you come up with 
    something generic enough, it'd be great to contribute it back to Hive 
    (which I can also help with).
    
    Fagan, Michael wrote:
    > Josh,
    >
    > Thanks, it looks like If I can override the getRanges() from the 
AccumuloPredicateHandler I might be able to build correct ranges based on 
matching index rows.
    > Does this sound feasible?
    >
    > Regards,
    > Mike Fagan
    >
    > On 1/9/17, 12:38 PM, "Josh Elser"<[email protected]>  wrote:
    >
    >      Hi Mike,
    >
    >      As far as I understand it, the Hive storage handler APIs (which is 
how
    >      the Accumulo integration is implemented) doesn't expose any ability 
to
    >      do use index tables to answer some query.
    >
    >      This means that the only thing you can do to make queries faster, 
would
    >      be to create a number of tables, pivoted on the columns you care 
about,
    >      putting the important columns in the rowId. Then, you would have to 
know
    >      which table to use at the application layer.
    >
    >      Admittedly, this is pretty lacking. I'd have to go look at the Hive
    >      community to see if this is something that's been built there.
    >
    >      - Josh
    >
    >      Fagan, Michael wrote:
    >      >  Hi,
    >      >
    >      >  I am looking to utilize an index table to avoid full table scans 
and speed up hive queries against an external accumulo table.
    >      >
    >      >  Has anyone done this yet? Can someone point me in the right 
direction?
    >      >
    >      >  Regards,
    >      >  Mike Fagan
    >      >
    >
    >
    >

Re: Implementing Index Table for Accumulo Hive Queries

Reply via email to