[
https://issues.apache.org/jira/browse/HIVE-21327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marta Kuczora updated HIVE-21327:
---------------------------------
Status: Patch Available (was: Open)
> Predicate is not pushed to Parquet if
> hive.parquet.timestamp.skip.conversion=true
> ---------------------------------------------------------------------------------
>
> Key: HIVE-21327
> URL: https://issues.apache.org/jira/browse/HIVE-21327
> Project: Hive
> Issue Type: Bug
> Affects Versions: 4.0.0
> Reporter: Marta Kuczora
> Assignee: Marta Kuczora
> Priority: Major
> Attachments: HIVE-21327.1.patch
>
>
> The Parquet FilterPredicate is created and set to the configuration in the
> ParquetRecordReaderBase.setFilter method. This method is used from the
> ParquetRecordReaderWrapper constructor through the
> ParquetRecordReaderBase.getSplit method and expects a JobConf as parameter
> where it sets the created filter predicate. In the ParquetRecordReaderWrapper
> constructor, multiple JobConf object is used:
> {noformat}
> jobConf = oldJobConf;
> final ParquetInputSplit split = getSplit(oldSplit, jobConf);
> TaskAttemptID taskAttemptID =
> TaskAttemptID.forName(jobConf.get(IOConstants.MAPRED_TASK_ID));
> if (taskAttemptID == null) {
> taskAttemptID = new TaskAttemptID();
> }
> // create a TaskInputOutputContext
> Configuration conf = jobConf;
> if (skipTimestampConversion ^ HiveConf.getBoolVar(
> conf, HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION)) {
> conf = new JobConf(oldJobConf);
> HiveConf.setBoolVar(conf,
> HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION,
> skipTimestampConversion);
> }
> final TaskAttemptContext taskContext =
> ContextUtil.newTaskAttemptContext(conf, taskAttemptID);
> {noformat}
> So we have the jobConf, oldJobConf and conf objects and the getSplit is
> called with the jobConf object, so the filter predicate will be set into this
> config object. Based on this code part, the jobConf and oldJobConf should be
> the same reference inside the if statement, so the newly created conf should
> also contain the filter predicate. However in the getSplit method the value
> of the jobConf is changed by the projectionPusher.pushProjectionsAndFilters
> method, so inside the if statement, the jobConf and the oldJobConf are
> actually different references. The filter predicate is set in the jobConf,
> but if the if condition is true, the conf will be created from the oldJobConf
> so it won't contain the filter predicate.
> Just for reference, this behavior was introduced in
> [HIVE-9873|https://issues.apache.org/jira/browse/HIVE-9873].
> Since the goal of the if statement is only to update the
> HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION property in the configuration, it
> should be using the jobConf where the filter predicate is correctly set.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)