If you have that info, yeah I think you could.

The lifecycle of those queries is a bit strange (and, IIRC, different depending on the execution engine Hive uses).

Experimentation is definitely the way forward :). Let me know if you need any help -- I'm happy to at least try to help. If you come up with something generic enough, it'd be great to contribute it back to Hive (which I can also help with).

Fagan, Michael wrote:
Josh,

Thanks, it looks like If I can override the getRanges() from the 
AccumuloPredicateHandler I might be able to build correct ranges based on 
matching index rows.
Does this sound feasible?

Regards,
Mike Fagan

On 1/9/17, 12:38 PM, "Josh Elser"<[email protected]>  wrote:

     Hi Mike,

     As far as I understand it, the Hive storage handler APIs (which is how
     the Accumulo integration is implemented) doesn't expose any ability to
     do use index tables to answer some query.

     This means that the only thing you can do to make queries faster, would
     be to create a number of tables, pivoted on the columns you care about,
     putting the important columns in the rowId. Then, you would have to know
     which table to use at the application layer.

     Admittedly, this is pretty lacking. I'd have to go look at the Hive
     community to see if this is something that's been built there.

     - Josh

     Fagan, Michael wrote:
     >  Hi,
     >
     >  I am looking to utilize an index table to avoid full table scans and 
speed up hive queries against an external accumulo table.
     >
     >  Has anyone done this yet? Can someone point me in the right direction?
     >
     >  Regards,
     >  Mike Fagan
     >



Reply via email to