[
https://issues.apache.org/jira/browse/IMPALA-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288936#comment-17288936
]
ASF subversion and git services commented on IMPALA-10501:
----------------------------------------------------------
Commit 89a5fc789cb7cdbab08e331c4c95d9cd4eb30b8d in impala's branch
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=89a5fc7 ]
IMPALA-10501: Hit DCHECK in parquet-column-readers.cc:
def_levels_.CacheRemaining() <= num_buffered_values_
We had a DCHECK in ScalarColumnReader::MaterializeValueBatch() that
checked that 'num_buffered_values_' is greater or equal to the
number of cached values in the Parquet definition level decoder.
In SkipTopLevelRows() we used decoder.ReadLevel() which loaded
the cache of the decoder with probably more values than the
actual value count. It is because literal runs are stored in groups
of 8, i.e. there might be padding zeros at the end.
Alternatively we can fill the cache of the decoder with
CacheNextBatch(num_vals). In this case we won't load more values
than the actual value count.
Testing
* until this patch TestParquetStats::test_page_index was flaky
because of this issue
* I tested the solution on a hacked Impala that randomly generated
skip ranges
Change-Id: Ic071473e7b315300fd5e163225d3e39735f09c4f
Reviewed-on: http://gerrit.cloudera.org:8080/17071
Reviewed-by: Zoltan Borok-Nagy <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Hit DCHECK in parquet-column-readers.cc: def_levels_.CacheRemaining() <=
> num_buffered_values_
> ----------------------------------------------------------------------------------------------
>
> Key: IMPALA-10501
> URL: https://issues.apache.org/jira/browse/IMPALA-10501
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 4.0
> Reporter: Tim Armstrong
> Assignee: Zoltán Borók-Nagy
> Priority: Blocker
> Labels: broken-build, crash, flaky, parquet
> Attachments: consoleText.3.gz, impalad_coord_exec-0.tar.gz
>
>
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3814/
> {noformat}
> F0211 03:55:26.383247 14487 parquet-column-readers.cc:517]
> be46bb72819942fd:85934edd00000001] Check failed: def_levels_.CacheRemaining()
> <= num_buffered_values_ (921 vs. 916)
> *** Check failure stack trace: ***
> @ 0x53646ec google::LogMessage::Fail()
> @ 0x5365fdc google::LogMessage::SendToLog()
> @ 0x536404a google::LogMessage::Flush()
> @ 0x5367c48 google::LogMessageFatal::~LogMessageFatal()
> @ 0x2ff886f
> impala::ScalarColumnReader<>::MaterializeValueBatch<>()
> @ 0x2f8ae44
> impala::ScalarColumnReader<>::MaterializeValueBatch<>()
> @ 0x2f761bf impala::ScalarColumnReader<>::ReadValueBatch<>()
> @ 0x2f2889a impala::ScalarColumnReader<>::ReadValueBatch()
> @ 0x2ebd8c0 impala::HdfsParquetScanner::AssembleRows()
> @ 0x2eb882e impala::HdfsParquetScanner::GetNextInternal()
> @ 0x2eb67bd impala::HdfsParquetScanner::ProcessSplit()
> @ 0x2aaf3f2 impala::HdfsScanNode::ProcessSplit()
> @ 0x2aae773 impala::HdfsScanNode::ScannerThread()
> @ 0x2aadadb
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @ 0x2aafe94
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @ 0x220e331 boost::function0<>::operator()()
> @ 0x2842e7f impala::Thread::SuperviseThread()
> @ 0x284ae1c boost::_bi::list5<>::operator()<>()
> @ 0x284ad40 boost::_bi::bind_t<>::operator()()
> @ 0x284ad01 boost::detail::thread_data<>::run()
> @ 0x406b291 thread_proxy
> @ 0x7f2465cba6b9 start_thread
> @ 0x7f24627e64dc clone
> rImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
> at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:866)
> {noformat}
> It was likely a fuzz test:
> {noformat}
> 19:55:23
> query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
> 50 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:23 [gw5] PASSED
> query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
> 50 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:23
> query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
> 80 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:25 [gw2] PASSED
> query_test/test_queries.py::TestPartitionKeyScans::test_partition_key_scans[protocol:
> beeswax | exec_option: {'mt_dop': 0, 'exec_single_node_rows_threshold': 0} |
> table_format: parquet/none]
> 19:55:25
> query_test/test_queries.py::TestPartitionKeyScans::test_partition_key_scans[protocol:
> beeswax | exec_option: {'mt_dop': 1, 'exec_single_node_rows_threshold': 0} |
> table_format: avro/snap/block]
> 19:55:26 [gw5] PASSED
> query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
> 80 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:26
> query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit:
> 130 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:26 [gw6] PASSED
> query_test/test_scanners.py::TestIceberg::test_iceberg_profile[protocol:
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold':
> 0} | table_format: parquet/none]
> 19:55:26
> query_test/test_scanners.py::TestIceberg::test_iceberg_profile[protocol:
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True,
> 'abort_on_error': 1, 'debug_action':
> '-1:OPEN:[email protected]',
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]
> 19:55:27 [gw3] FAILED
> query_test/test_decimal_fuzz.py::TestDecimalFuzz::test_decimal_ops[exec_option:
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000,
> 'disable_codegen': False, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0}]
> 19:55:28
> query_test/test_decimal_fuzz.py::TestDecimalFuzz::test_width_bucket[exec_option:
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000,
> 'disable_codegen': False, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0}]
> 19:55:28 [gw3] FAILED
> query_test/test_decimal_fuzz.py::TestDecimalFuzz::test_width_bucket[exec_option:
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000,
> 'disable_codegen': False, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0}]
> 19:55:28
> query_test/test_decimal_queries.py::TestDecimalQueries::test_queries[protocol:
> beeswax | exec_option: {'disable_codegen_rows_threshold': 0,
> 'disable_codegen': 'false', 'decimal_v2': 'false', 'batch_size': 0} |
> table_format: text/none]
> 19:55:28 [gw6] ERROR
> query_test/test_scanners.py::TestIceberg::test_iceberg_profile[protocol:
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True,
> 'abort_on_error': 1, 'debug_action':
> '-1:OPEN:[email protected]',
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]
> 19:55:28 [gw8] FAILED
> query_test/test_join_queries.py::TestJoinQueries::test_empty_build_joins[protocol:
> beeswax | table_format: parquet/none | exec_option: {'batch_size': 0,
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen':
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} |
> enable_outer_join_to_inner_transformation: false | batch_size: 0 | mt_dop: 0]
> 19:55:28 [gw13] FAILED
> query_test/test_parquet_stats.py::TestParquetStats::test_page_index[mt_dop: 0
> | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:28 [gw12] FAILED
> query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filters[mt_dop:
> 0 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:28
> query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filters[mt_dop:
> 4 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> 19:55:28
> query_test/test_join_queries.py::TestJoinQueries::test_empty_build_joins[protocol:
> beeswax | table_format: parquet/none | exec_option: {'batch_size': 0,
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen':
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} |
> enable_outer_join_to_inner_transformation: true | batch_size: 0 | mt_dop: 4]
> 19:55:28 [gw1] FAILED
> query_test/test_scanners.py::TestScannersAllTableFormatsWithLimit::test_limit[mt_dop:
> 0 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> parquet/none]
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]