[ https://issues.apache.org/jira/browse/PHOENIX-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Istvan Toth updated PHOENIX-5362: --------------------------------- Fix Version/s: (was: 5.2.1) > Mappers should use the queryPlan from the driver rather than regenerating the > plan > ---------------------------------------------------------------------------------- > > Key: PHOENIX-5362 > URL: https://issues.apache.org/jira/browse/PHOENIX-5362 > Project: Phoenix > Issue Type: Improvement > Reporter: Chinmay Kulkarni > Priority: Major > > Currently, PhoenixInputFormat#getQueryPlan already generates a queryPlan and > we use this plan to get the scans and splits for the MR job. In > PhoenixInputFormat#createRecordReader which is called inside each mapper, we > again create a queryPlan and pass this to the PhoenixRecordReader instance. > There are multiple problems with this approach: > # The mappers already have information about the scans from the driver code. > We potentially just need to wrap these scans in an iterator and create a > subsequent ResultSet. > # The mappers don't need most of the information embedded within a queryPlan, > so they shouldn't need to regenerate the plan. > # There are weird corner cases that can occur if we replan the query in each > mapper. For ex: If there is an index creation or metadata change in between > when the MR job was created, and when the mappers actually launch. In this > case, the mappers have the scans created for the first queryPlan, but the > mappers will use iterators created for the second queryPlan. In such cases, > the issued scans would not match the queryPlan embedded in the mappers' > iterators/ResultSet. We could potentially miss some scans/be looking for more > than we actually require since we check the original scans for this size. The > resolved table would be as per the new queryPlan, and there could be a > mismatch here as well (considering the index creation case). There are > potentially other repercussions in case of intermediary metadata changes as > well. > Serializing a subset of the information (like the projector, which iterator > to use, etc.) of a QueryPlan and passing it from the driver to the mappers > without having them regenerate the plans seems like the best way forward. -- This message was sent by Atlassian Jira (v8.20.10#820010)