[ 
https://issues.apache.org/jira/browse/PHOENIX-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978841#comment-13978841
 ] 

Nick Dimiduk commented on PHOENIX-946:
--------------------------------------

bq. Build something similar to HiveHBaseHandler where we try to leverage a lot 
of Inputformat code written for Pig and be able to run Hive Queries as MR jobs.

Yes, this seems like a sensible thing to do. Better still would be to converge 
the implementation where possible.

bq. This would indeed mean that the queries would be executed within the 
context of MR jobs (or a Tez DAG?), so it will mean that many of the advantages 
of Phoenix would probably be lost, or at least not used (such as doing 
pre-aggregations in coprocessors, for example).

The Hive cli has some trickery whereby if it knows a query will execute 
quickly, it will just execute it locally rather than shoving off to the 
distributed runtime. I don't know exactly what causes that to kick in. Whether 
executed locally or distributed, the same StorageHandler implementation is 
consumed. I don't see why execution via StorageHandler (either locally or 
distributed) is unable to take advantage of Phoenix optimizations. Further, for 
queries over large amounts of data, distributed execution can be advantageous 
-- split the query into multiple smaller queries to run in parallel and take 
advantage of more machines' IO.

bq. However, the potential win that I see is that it would allow doing things 
like joining data stored on HDFS with data stored in Phoenix/HBase. In other 
words, I see this as giving Hive access to Phoenix data, and not the other way 
around.

Yes, this is the angle from which I'm approaching this topic, for the use-case 
you mention.

> Use Phoenix to service Hive queries over HBase data
> ---------------------------------------------------
>
>                 Key: PHOENIX-946
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-946
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to