[ 
https://issues.apache.org/jira/browse/PHOENIX-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875761#comment-15875761
 ] 

Jeongdae Kim commented on PHOENIX-3536:
---------------------------------------

[~jamestaylor] the intention of this patch is to reduce unnecessary operations 
that occur during initializing phoenix connection in Hive Map Tasks. phoenix 
storage handler(PhoenixInputFormat) makes input splits through phoenix JDBC 
connection when the Hive MR job submitted, and all map tasks from the MR job 
create phoenix record readers that make phoenix connections respectively. 
although all information(query plan) to execute the query is already obtained 
during job submission, all map tasks try to make the query plan again that 
takes quite a long time to establish initial phoenix connection establishment 
to load all phoenix metadata from system table in client process (2~3 seconds 
in my test cases). with this patch, we can save quite a time for all map tasks, 
because all map tasks skip initialization process of phoenix connection by 
re-using the query plan created from prior process(job submission)

> Remove creating unnecessary phoenix connections in MR Tasks of Hive
> -------------------------------------------------------------------
>
>                 Key: PHOENIX-3536
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3536
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Jeongdae Kim
>            Assignee: Jeongdae Kim
>              Labels: HivePhoenix
>         Attachments: PHOENIX-3536.1.patch
>
>
> PhoenixStorageHandler creates phoenix connections to make QueryPlan in 
> getSplit phase(prepare MR) and getRecordReader phase(Map) while running MR 
> Job.
> in phoenix, it spends too many times to create the first phoenix 
> connection(QueryServices) for specific URL. (checking and loading phoenix 
> schema information)
> i found it is possible to remove creating query plan again in Map 
> phase(getRecordReader()) by serializing QueryPlan created from Input format 
> ans passing this plan to record reader. 
>  this approach improves scan performance by removing trying to unnecessary 
> connection in map phase.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to