[
https://issues.apache.org/jira/browse/PHOENIX-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978841#comment-13978841
]
Nick Dimiduk commented on PHOENIX-946:
--------------------------------------
bq. Build something similar to HiveHBaseHandler where we try to leverage a lot
of Inputformat code written for Pig and be able to run Hive Queries as MR jobs.
Yes, this seems like a sensible thing to do. Better still would be to converge
the implementation where possible.
bq. This would indeed mean that the queries would be executed within the
context of MR jobs (or a Tez DAG?), so it will mean that many of the advantages
of Phoenix would probably be lost, or at least not used (such as doing
pre-aggregations in coprocessors, for example).
The Hive cli has some trickery whereby if it knows a query will execute
quickly, it will just execute it locally rather than shoving off to the
distributed runtime. I don't know exactly what causes that to kick in. Whether
executed locally or distributed, the same StorageHandler implementation is
consumed. I don't see why execution via StorageHandler (either locally or
distributed) is unable to take advantage of Phoenix optimizations. Further, for
queries over large amounts of data, distributed execution can be advantageous
-- split the query into multiple smaller queries to run in parallel and take
advantage of more machines' IO.
bq. However, the potential win that I see is that it would allow doing things
like joining data stored on HDFS with data stored in Phoenix/HBase. In other
words, I see this as giving Hive access to Phoenix data, and not the other way
around.
Yes, this is the angle from which I'm approaching this topic, for the use-case
you mention.
> Use Phoenix to service Hive queries over HBase data
> ---------------------------------------------------
>
> Key: PHOENIX-946
> URL: https://issues.apache.org/jira/browse/PHOENIX-946
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)