[
https://issues.apache.org/jira/browse/IMPALA-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285385#comment-17285385
]
Zoltán Borók-Nagy edited comment on IMPALA-10501 at 2/16/21, 6:00 PM:
----------------------------------------------------------------------
Looping load_nested and running the query didn't help. However, I was able to
reproduce the error on a hacked version of Impala.
I added the following code to HdfsParquetScanner::EvaluatePageIndex():
{noformat}
skip_ranges.clear();
const int RANGES = 5;
int64_t num_vals = row_group.num_rows;
int64_t range_length = num_vals / RANGES;
for (int i = 0; i < RANGES; ++i) {
int64_t range_start = i * range_length + range_length / 2;
int64_t range_end = range_start + range_length / 2;
skip_ranges.push_back({range_start, range_end});
}{noformat}
The above code adds a couple of "skip ranges" that are unaligned with the
Parquet pages. This reproduced the error and I was able to create a fix:
[https://gerrit.cloudera.org/#/c/17071/]
was (Author: boroknagyz):
Looping load_nested and running the query didn't help. However, I was able to
reproduce the error on a hacked version of Impala.
I added the following code to HdfsParquetScanner::EvaluatePageIndex():
{noformat}
skip_ranges.clear();
const int RANGES = 5;
int64_t num_vals = row_group.num_rows;
int64_t range_length = num_vals / RANGES;
for (int i = 0; i < RANGES; ++i) {
int64_t range_start = i * range_length + range_length / 2;
int64_t range_end = range_start + range_length / 2;
skip_ranges.push_back({range_start, range_end});
}{noformat}
The above code adds a couple of "skip ranges" that are unaligned with the
Parquet pages. This reproduced the error and I was able to create a fix:
[https://gerrit.cloudera.org/#/c/17071/]
> Hit DCHECK in parquet-column-readers.cc: def_levels_.CacheRemaining() <=
> num_buffered_values_
> ----------------------------------------------------------------------------------------------
>
> Key: IMPALA-10501
> URL: https://issues.apache.org/jira/browse/IMPALA-10501
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 4.0
> Reporter: Tim Armstrong
> Assignee: Zoltán Borók-Nagy
> Priority: Blocker
> Labels: broken-build, crash, flaky, parquet
> Attachments: consoleText.3.gz, impalad_coord_exec-0.tar.gz
>
>
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3814/
> {noformat}
> F0211 03:55:26.383247 14487 parquet-column-readers.cc:517]
> be46bb72819942fd:85934edd00000001] Check failed: def_levels_.CacheRemaining()
> <= num_buffered_values_ (921 vs. 916)
> *** Check failure stack trace: ***
> @ 0x53646ec google::LogMessage::Fail()
> @ 0x5365fdc google::LogMessage::SendToLog()
> @ 0x536404a google::LogMessage::Flush()
> @ 0x5367c48 google::LogMessageFatal::~LogMessageFatal()
> @ 0x2ff886f
> impala::ScalarColumnReader<>::MaterializeValueBatch<>()
> @ 0x2f8ae44
> impala::ScalarColumnReader<>::MaterializeValueBatch<>()
> @ 0x2f761bf impala::ScalarColumnReader<>::ReadValueBatch<>()
> @ 0x2f2889a impala::ScalarColumnReader<>::ReadValueBatch()
> @ 0x2ebd8c0 impala::HdfsParquetScanner::AssembleRows()
> @ 0x2eb882e impala::HdfsParquetScanner::GetNextInternal()
> @ 0x2eb67bd impala::HdfsParquetScanner::ProcessSplit()
> @ 0x2aaf3f2 impala::HdfsScanNode::ProcessSplit()
> @ 0x2aae773 impala::HdfsScanNode::ScannerThread()
> @ 0x2aadadb
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @ 0x2aafe94
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @ 0x220e331 boost::function0<>::operator()()
> @ 0x2842e7f impala::Thread::SuperviseThread()
> @ 0x284ae1c boost::_bi::list5<>::operator()<>()
> @ 0x284ad40 boost::_bi::bind_t<>::operator()()
> @ 0x284ad01 boost::detail::thread_data<>::run()
> @ 0x406b291 thread_proxy
> @ 0x7f2465cba6b9 start_thread
> @ 0x7f24627e64dc clone
> rImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
> at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:866)
> {noformat}
> It was likely a fuzz test:
> {noformat}
> 19:55:23
> query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
> 50 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:23 [gw5] PASSED
> query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
> 50 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:23
> query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
> 80 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:25 [gw2] PASSED
> query_test/test_queries.py::TestPartitionKeyScans::test_partition_key_scans[protocol:
> beeswax | exec_option: {'mt_dop': 0, 'exec_single_node_rows_threshold': 0} |
> table_format: parquet/none]
> 19:55:25
> query_test/test_queries.py::TestPartitionKeyScans::test_partition_key_scans[protocol:
> beeswax | exec_option: {'mt_dop': 1, 'exec_single_node_rows_threshold': 0} |
> table_format: avro/snap/block]
> 19:55:26 [gw5] PASSED
> query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
> 80 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:26
> query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
> 130 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:26 [gw6] PASSED
> query_test/test_scanners.py::TestIceberg::test_iceberg_profile[protocol:
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold':
> 0} | table_format: parquet/none]
> 19:55:26
> query_test/test_scanners.py::TestIceberg::test_iceberg_profile[protocol:
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True,
> 'abort_on_error': 1, 'debug_action':
> '-1:OPEN:[email protected]',
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]
> 19:55:27 [gw3] FAILED
> query_test/test_decimal_fuzz.py::TestDecimalFuzz::test_decimal_ops[exec_option:
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000,
> 'disable_codegen': False, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0}]
> 19:55:28
> query_test/test_decimal_fuzz.py::TestDecimalFuzz::test_width_bucket[exec_option:
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000,
> 'disable_codegen': False, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0}]
> 19:55:28 [gw3] FAILED
> query_test/test_decimal_fuzz.py::TestDecimalFuzz::test_width_bucket[exec_option:
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000,
> 'disable_codegen': False, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0}]
> 19:55:28
> query_test/test_decimal_queries.py::TestDecimalQueries::test_queries[protocol:
> beeswax | exec_option: {'disable_codegen_rows_threshold': 0,
> 'disable_codegen': 'false', 'decimal_v2': 'false', 'batch_size': 0} |
> table_format: text/none]
> 19:55:28 [gw6] ERROR
> query_test/test_scanners.py::TestIceberg::test_iceberg_profile[protocol:
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True,
> 'abort_on_error': 1, 'debug_action':
> '-1:OPEN:[email protected]',
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]
> 19:55:28 [gw8] FAILED
> query_test/test_join_queries.py::TestJoinQueries::test_empty_build_joins[protocol:
> beeswax | table_format: parquet/none | exec_option: {'batch_size': 0,
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen':
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} |
> enable_outer_join_to_inner_transformation: false | batch_size: 0 | mt_dop: 0]
> 19:55:28 [gw13] FAILED
> query_test/test_parquet_stats.py::TestParquetStats::test_page_index[mt_dop: 0
> | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:28 [gw12] FAILED
> query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filters[mt_dop:
> 0 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:28
> query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filters[mt_dop:
> 4 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:28
> query_test/test_join_queries.py::TestJoinQueries::test_empty_build_joins[protocol:
> beeswax | table_format: parquet/none | exec_option: {'batch_size': 0,
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen':
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} |
> enable_outer_join_to_inner_transformation: true | batch_size: 0 | mt_dop: 4]
> 19:55:28 [gw1] FAILED
> query_test/test_scanners.py::TestScannersAllTableFormatsWithLimit::test_limit[mt_dop:
> 0 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]