[
https://issues.apache.org/jira/browse/IMPALA-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184011#comment-17184011
]
Zoltán Borók-Nagy commented on IMPALA-9952:
-------------------------------------------
[~guojingfeng] Could you tell me the schema of your table? Is the data sorted
by any of the columns? What kind of queries hit this bug?
Also, were you able to reproduce this bug on an obscured data set that can be
shared?
> Parquet with lz4 ColumnIndex filter error
> -----------------------------------------
>
> Key: IMPALA-9952
> URL: https://issues.apache.org/jira/browse/IMPALA-9952
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 3.4.0
> Reporter: guojingfeng
> Priority: Major
>
> When reading parquet file with lz4 compress codec, encountered the following
> error:
> {code:java}
> I0714 16:11:48.307806 1075820 runtime-state.cc:207]
> 8c43203adb2d4fc8:0478df9b0000018b] Error from query
> 8c43203adb2d4fc8:0478df9b00000000: Invalid offset index in Parquet file
> hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
> I0714 16:11:48.834901 1075838 status.cc:126]
> 8c43203adb2d4fc8:0478df9b000002c0] Invalid offset index in Parquet file
> hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
> @ 0xbf4ef9
> @ 0x1748c41
> @ 0x174e170
> @ 0x1750e58
> @ 0x17519f0
> @ 0x1748559
> @ 0x1510b41
> @ 0x1512c8f
> @ 0x137488a
> @ 0x1375759
> @ 0x1b48a19
> @ 0x7f34509f5e24
> @ 0x7f344d5ed35c
> I0714 16:11:48.835763 1075838 runtime-state.cc:207]
> 8c43203adb2d4fc8:0478df9b000002c0] Error from query
> 8c43203adb2d4fc8:0478df9b00000000: Invalid offset index in Parquet file
> hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
> I0714 16:11:48.893784 1075820 status.cc:126]
> 8c43203adb2d4fc8:0478df9b0000018b] Top level rows aren't in sync during page
> filtering in file
> hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
> @ 0xbf4ef9
> @ 0x1749104
> @ 0x17494cc
> @ 0x1751aee
> @ 0x1748559
> @ 0x1510b41
> @ 0x1512c8f
> @ 0x137488a
> @ 0x1375759
> @ 0x1b48a19
> @ 0x7f34509f5e24
> @ 0x7f344d5ed35c
> {code}
> Corresponding source code:
> {code:java}
> Status HdfsParquetScanner::CheckPageFiltering() {
> if (candidate_ranges_.empty() || scalar_readers_.empty()) return
> Status::OK(); int64_t current_row = scalar_readers_[0]->LastProcessedRow();
> for (int i = 1; i < scalar_readers_.size(); ++i) {
> if (current_row != scalar_readers_[i]->LastProcessedRow()) {
> DCHECK(false);
> return Status(Substitute(
> "Top level rows aren't in sync during page filtering in file $0.",
> filename()));
> }
> }
> return Status::OK();
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]