[
https://issues.apache.org/jira/browse/PHOENIX-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lars Hofhansl updated PHOENIX-5362:
-----------------------------------
Fix Version/s: (was: 5.1.0)
5.1.1
> Mappers should use the queryPlan from the driver rather than regenerating the
> plan
> ----------------------------------------------------------------------------------
>
> Key: PHOENIX-5362
> URL: https://issues.apache.org/jira/browse/PHOENIX-5362
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Chinmay Kulkarni
> Priority: Major
> Fix For: 4.15.1, 5.1.1
>
>
> Currently, PhoenixInputFormat#getQueryPlan already generates a queryPlan and
> we use this plan to get the scans and splits for the MR job. In
> PhoenixInputFormat#createRecordReader which is called inside each mapper, we
> again create a queryPlan and pass this to the PhoenixRecordReader instance.
> There are multiple problems with this approach:
> # The mappers already have information about the scans from the driver code.
> We potentially just need to wrap these scans in an iterator and create a
> subsequent ResultSet.
> # The mappers don't need most of the information embedded within a queryPlan,
> so they shouldn't need to regenerate the plan.
> # There are weird corner cases that can occur if we replan the query in each
> mapper. For ex: If there is an index creation or metadata change in between
> when the MR job was created, and when the Mappers actually launch. In this
> case, the mappers have the scans created for the first queryPlan, but the
> mappers will use iterators created for the second queryPlan. In such cases,
> the issued scans would not match the queryPlan embedded in the mappers'
> iterators/ResultSet. We could potentially miss some scans/be looking for more
> than we actually require since we check the original scans for this size. The
> resolved table would be as per the new queryPlan, and there could be a
> mismatch here as well (considering the index creation case you mentioned).
> There are potentially other repercussions in case of intermediary metadata
> changes as well.
> Serializing a subset of the information (like the projector, which iterator
> to use, etc.) of a QueryPlan and passing it from the driver to the mappers
> without having them regenerate the plans seems like the best way forward.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)