[jira] [Updated] (PHOENIX-5362) Mappers should use the queryPlan from the driver rather than regenerating the plan

Istvan Toth (Jira) Thu, 05 Dec 2024 00:01:21 -0800


     [ 
https://issues.apache.org/jira/browse/PHOENIX-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Istvan Toth updated PHOENIX-5362:
---------------------------------
    Fix Version/s:     (was: 5.2.1)

> Mappers should use the queryPlan from the driver rather than regenerating the 
> plan
> ----------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5362
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5362
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Chinmay Kulkarni
>            Priority: Major
>
> Currently, PhoenixInputFormat#getQueryPlan already generates a queryPlan and 
> we use this plan to get the scans and splits for the MR job. In 
> PhoenixInputFormat#createRecordReader which is called inside each mapper, we 
> again create a queryPlan and pass this to the PhoenixRecordReader instance.
> There are multiple problems with this approach:
> # The mappers already have information about the scans from the driver code. 
> We potentially just need to wrap these scans in an iterator and create a 
> subsequent ResultSet.
> # The mappers don't need most of the information embedded within a queryPlan, 
> so they shouldn't need to regenerate the plan.
> # There are weird corner cases that can occur if we replan the query in each 
> mapper. For ex: If there is an index creation or metadata change in between 
> when the MR job was created, and when the mappers actually launch. In this 
> case, the mappers have the scans created for the first queryPlan, but the 
> mappers will use iterators created for the second queryPlan. In such cases, 
> the issued scans would not match the queryPlan embedded in the mappers' 
> iterators/ResultSet. We could potentially miss some scans/be looking for more 
> than we actually require since we check the original scans for this size. The 
> resolved table would be as per the new queryPlan, and there could be a 
> mismatch here as well (considering the index creation case). There are 
> potentially other repercussions in case of intermediary metadata changes as 
> well.
> Serializing a subset of the information (like the projector, which iterator 
> to use, etc.) of a QueryPlan and passing it from the driver to the mappers 
> without having them regenerate the plans seems like the best way forward.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-5362) Mappers should use the queryPlan from the driver rather than regenerating the plan

Reply via email to