[
https://issues.apache.org/jira/browse/IMPALA-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
abeltian updated IMPALA-9952:
-----------------------------
Description:
When reading parquet file in impala 3.4, encountered the following error:
{code:java}
I0714 16:11:48.307806 1075820 runtime-state.cc:207]
8c43203adb2d4fc8:0478df9b0000018b] Error from query
8c43203adb2d4fc8:0478df9b00000000: Invalid offset index in Parquet file
hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
I0714 16:11:48.834901 1075838 status.cc:126] 8c43203adb2d4fc8:0478df9b000002c0]
Invalid offset index in Parquet file
hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
@ 0xbf4ef9
@ 0x1748c41
@ 0x174e170
@ 0x1750e58
@ 0x17519f0
@ 0x1748559
@ 0x1510b41
@ 0x1512c8f
@ 0x137488a
@ 0x1375759
@ 0x1b48a19
@ 0x7f34509f5e24
@ 0x7f344d5ed35c
I0714 16:11:48.835763 1075838 runtime-state.cc:207]
8c43203adb2d4fc8:0478df9b000002c0] Error from query
8c43203adb2d4fc8:0478df9b00000000: Invalid offset index in Parquet file
hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
I0714 16:11:48.893784 1075820 status.cc:126] 8c43203adb2d4fc8:0478df9b0000018b]
Top level rows aren't in sync during page filtering in file
hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
@ 0xbf4ef9
@ 0x1749104
@ 0x17494cc
@ 0x1751aee
@ 0x1748559
@ 0x1510b41
@ 0x1512c8f
@ 0x137488a
@ 0x1375759
@ 0x1b48a19
@ 0x7f34509f5e24
@ 0x7f344d5ed35c
{code}
Corresponding source code:
{code:java}
Status HdfsParquetScanner::CheckPageFiltering() {
if (candidate_ranges_.empty() || scalar_readers_.empty()) return
Status::OK(); int64_t current_row = scalar_readers_[0]->LastProcessedRow();
for (int i = 1; i < scalar_readers_.size(); ++i) {
if (current_row != scalar_readers_[i]->LastProcessedRow()) {
DCHECK(false);
return Status(Substitute(
"Top level rows aren't in sync during page filtering in file $0.",
filename()));
}
}
return Status::OK();
}
{code}
was:
When reading parquet file with lz4 compress codec, encountered the following
error:
{code:java}
I0714 16:11:48.307806 1075820 runtime-state.cc:207]
8c43203adb2d4fc8:0478df9b0000018b] Error from query
8c43203adb2d4fc8:0478df9b00000000: Invalid offset index in Parquet file
hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
I0714 16:11:48.834901 1075838 status.cc:126] 8c43203adb2d4fc8:0478df9b000002c0]
Invalid offset index in Parquet file
hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
@ 0xbf4ef9
@ 0x1748c41
@ 0x174e170
@ 0x1750e58
@ 0x17519f0
@ 0x1748559
@ 0x1510b41
@ 0x1512c8f
@ 0x137488a
@ 0x1375759
@ 0x1b48a19
@ 0x7f34509f5e24
@ 0x7f344d5ed35c
I0714 16:11:48.835763 1075838 runtime-state.cc:207]
8c43203adb2d4fc8:0478df9b000002c0] Error from query
8c43203adb2d4fc8:0478df9b00000000: Invalid offset index in Parquet file
hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
I0714 16:11:48.893784 1075820 status.cc:126] 8c43203adb2d4fc8:0478df9b0000018b]
Top level rows aren't in sync during page filtering in file
hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
@ 0xbf4ef9
@ 0x1749104
@ 0x17494cc
@ 0x1751aee
@ 0x1748559
@ 0x1510b41
@ 0x1512c8f
@ 0x137488a
@ 0x1375759
@ 0x1b48a19
@ 0x7f34509f5e24
@ 0x7f344d5ed35c
{code}
Corresponding source code:
{code:java}
Status HdfsParquetScanner::CheckPageFiltering() {
if (candidate_ranges_.empty() || scalar_readers_.empty()) return
Status::OK(); int64_t current_row = scalar_readers_[0]->LastProcessedRow();
for (int i = 1; i < scalar_readers_.size(); ++i) {
if (current_row != scalar_readers_[i]->LastProcessedRow()) {
DCHECK(false);
return Status(Substitute(
"Top level rows aren't in sync during page filtering in file $0.",
filename()));
}
}
return Status::OK();
}
{code}
> Invalid offset index in Parquet file
> -------------------------------------
>
> Key: IMPALA-9952
> URL: https://issues.apache.org/jira/browse/IMPALA-9952
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 3.4.0
> Reporter: guojingfeng
> Priority: Major
>
> When reading parquet file in impala 3.4, encountered the following error:
> {code:java}
> I0714 16:11:48.307806 1075820 runtime-state.cc:207]
> 8c43203adb2d4fc8:0478df9b0000018b] Error from query
> 8c43203adb2d4fc8:0478df9b00000000: Invalid offset index in Parquet file
> hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
> I0714 16:11:48.834901 1075838 status.cc:126]
> 8c43203adb2d4fc8:0478df9b000002c0] Invalid offset index in Parquet file
> hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
> @ 0xbf4ef9
> @ 0x1748c41
> @ 0x174e170
> @ 0x1750e58
> @ 0x17519f0
> @ 0x1748559
> @ 0x1510b41
> @ 0x1512c8f
> @ 0x137488a
> @ 0x1375759
> @ 0x1b48a19
> @ 0x7f34509f5e24
> @ 0x7f344d5ed35c
> I0714 16:11:48.835763 1075838 runtime-state.cc:207]
> 8c43203adb2d4fc8:0478df9b000002c0] Error from query
> 8c43203adb2d4fc8:0478df9b00000000: Invalid offset index in Parquet file
> hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
> I0714 16:11:48.893784 1075820 status.cc:126]
> 8c43203adb2d4fc8:0478df9b0000018b] Top level rows aren't in sync during page
> filtering in file
> hdfs://path/4844de7af4545a39-e8ebc7da0000005f_2015704758_data.0.parq.
> @ 0xbf4ef9
> @ 0x1749104
> @ 0x17494cc
> @ 0x1751aee
> @ 0x1748559
> @ 0x1510b41
> @ 0x1512c8f
> @ 0x137488a
> @ 0x1375759
> @ 0x1b48a19
> @ 0x7f34509f5e24
> @ 0x7f344d5ed35c
> {code}
> Corresponding source code:
> {code:java}
> Status HdfsParquetScanner::CheckPageFiltering() {
> if (candidate_ranges_.empty() || scalar_readers_.empty()) return
> Status::OK(); int64_t current_row = scalar_readers_[0]->LastProcessedRow();
> for (int i = 1; i < scalar_readers_.size(); ++i) {
> if (current_row != scalar_readers_[i]->LastProcessedRow()) {
> DCHECK(false);
> return Status(Substitute(
> "Top level rows aren't in sync during page filtering in file $0.",
> filename()));
> }
> }
> return Status::OK();
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]