[
https://issues.apache.org/jira/browse/IMPALA-11345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573325#comment-17573325
]
ASF subversion and git services commented on IMPALA-11345:
----------------------------------------------------------
Commit 0b9bead70084f2ed1a55ca38ceb7b3ebe30eebce in impala's branch
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0b9bead70 ]
IMPALA-11345: Parquet Bloom filtering failure if column is added to the
schema
If a new column was added to an existing table with existing data and
Parquet Bloom filtering was turned ON, queries having an equality
conjunct on the new column failed.
This was because the old Parquet data files did not have the new column
in their schema and could not find a column for the conjunct. This was
treated as an error and the query failed.
After this patch this situation is no longer treated as an error and the
conjunct is simply disregarded for Bloom filtering in the files that
lack the new column.
Testing:
- added the test
TestParquetBloomFilter::test_parquet_bloom_filtering_schema_change in
tests/query_test/test_parquet_bloom_filter.py that checks that a
query as described above does not fail.
Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Reviewed-on: http://gerrit.cloudera.org:8080/18779
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Query failed when creating equal conjunction map for Parquet bloom filter
> -------------------------------------------------------------------------
>
> Key: IMPALA-11345
> URL: https://issues.apache.org/jira/browse/IMPALA-11345
> Project: IMPALA
> Issue Type: Bug
> Components: Backend, Distributed Exec
> Affects Versions: Impala 4.1.0
> Environment: CentOS-7, Impala-4.1
> Reporter: Yuchen Fan
> Assignee: Daniel Becker
> Priority: Critical
>
> When querying Hive table was added columns without using 'cascade', Impala
> will encounter error like "Unable to find SchemaNode for path
> 'db.table.column' in the schema of file
> 'hdfs://xxx/path/to/parquet_file_before_add_column'." I checked parquet file
> in error log and found that the schema is not compatible with table metadata.
> Call stack is attached as below. Path and table name is masked:
> {code:java}
> I0609 18:04:25.970052 115413 status.cc:129]
> c94d0ab3fdf8f943:3203006100000002] Unable to find SchemaNode for path
> 'xxx_db.xxx_table.xxx_column' in the schema of file
> 'hdfs://xxx_nn/xxx_table_path/000000_0'.
> @ 0xea543b impala::Status::Status()
> @ 0x1e3225c
> impala::HdfsParquetScanner::CreateColIdx2EqConjunctMap()
> @ 0x1e363ea impala::HdfsParquetScanner::Open()
> @ 0x19b40d0
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
> @ 0x1b5cbae impala::HdfsScanNode::ProcessSplit()
> @ 0x1b5e12a impala::HdfsScanNode::ScannerThread()
> @ 0x1b5e9c6
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @ 0x18eafa9 impala::Thread::SuperviseThread()
> @ 0x18ee11a boost::detail::thread_data<>::run()
> @ 0x2385510 thread_proxy
> @ 0x7fb5b0745162 start_thread
> @ 0x7fb5ad21df6c __clone{code}
> The error may be relation with
> [IMPALA-10640|https://issues.apache.org/jira/browse/IMPALA-10640]. Bloom
> filter requires right hand values of equal conjunction matches with current
> file schema. The filter will be unavailable if the column does not exist in
> all parquet files scanned. I think we can disable parquet bloom filter for
> this single query or scan node when discovered such situation.
> How to reproduce (using impala-shell):
> # create table parquet_test (id INT) stored as parquet;
> # insert into parquet_test values (1),(2),(3);
> # alter table parquet_test add columns (name STRING);
> # insert into parquet_test values (4, "James");
> # select * from parquet_test where name in ("Lily");
> # Error occured.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]