[ 
https://issues.apache.org/jira/browse/IMPALA-11039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516225#comment-17516225
 ] 

ASF subversion and git services commented on IMPALA-11039:
----------------------------------------------------------

Commit 0fb14962d7db7be8efcf0559b1781872b3e36e6e in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0fb1496 ]

IMPALA-11039: Fix incorrect page jumping in late materialization of Parquet

The current calculation of LastRowIdxInCurrentPage() is incorrect. It
uses the first row index of the next candidate page instead of the next
valid page. The next candidate page could be far away from the current
page. Thus giving a number larger than the current page size. Skipping
rows in the current page could overflow the boundary due to this. This
patch fixes LastRowIdxInCurrentPage() to use the next valid page.

When skip_row_id is set (>0), the current approach of
SkipRowsInternal<false>() expects jumping to a page containing this row
and then skipping rows in that page. However, the expected row might
not be in the candidate pages. When we jump to the next candidate page,
the target row could already be skipped. In this case, we don't need to
skip rows in the current page.

Tests:
 - Add a test on alltypes_empty_pages to reveal the bug.
 - Add more batch_size values in test_page_index.
 - Pass tests/query_test/test_parquet_stats.py locally.

Change-Id: I3a783115ba8faf1a276e51087f3a70f79402c21d
Reviewed-on: http://gerrit.cloudera.org:8080/18372
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> DCHECK_GE(num_buffered_values_, num_rows) fails in parquet-column-readers.cc
> ----------------------------------------------------------------------------
>
>                 Key: IMPALA-11039
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11039
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Abhishek Rawat
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> From the test logs, following test case seems to be failing:
> {code:java}
> query_test.test_parquet_stats.TestParquetStats.test_page_index[mt_dop: 1 | 
> protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] (from pytest)
> ....
> -- 2021-11-28 16:19:15,196 INFO     MainThread: Started query 
> 724d7ada4453a955:cbd1617900000000
> SET 
> client_identifier=query_test/test_parquet_stats.py::TestParquetStats::()::test_page_index[mt_dop:1|protocol:beeswax|exec_option:{&apos;batch_size&apos;:0;&apos;num_nodes&apos;:0;&apos;disable_codegen_rows_threshold&apos;:0;&apos;disable_codegen&apos;:False;&apos;abort_on_error&apos;:1;&apos;exec_single_node_rows_threshold&apos;:0}|;
> SET batch_size=32;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET exec_single_node_rows_threshold=0;
> -- 2021-11-28 16:19:15,198 INFO     MainThread: Loading query test file: 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-large.test
> -- executing against localhost:21000
> ...
> ...
> ...
> -- 2021-11-28 16:19:36,435 INFO     MainThread: Started query 
> cc42e37e061881ae:6128a31000000000
> </system-err><error message="test setup failure">conftest.py:376: in cleanup
>     request.instance.execute_query_expect_success(request.instance.client, 
> &quot;use default&quot;)
> common/impala_test_suite.py:831: in wrapper
>     return function(*args, **kwargs)
> common/impala_test_suite.py:839: in execute_query_expect_success
>     result = cls.__execute_query(impalad_client, query, query_options, user)
> common/impala_test_suite.py:956: in __execute_query
>     return impalad_client.execute(query, user=user)
> common/impala_connection.py:212: in execute
>     return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:189: in execute
>     handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:365: in __execute_query
>     handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:359: in execute_query_async
>     handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:518: in __do_rpc
>     raise ImpalaBeeswaxException(&quot;Not connected&quot;, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> E    Not connected</error><system-err>SET 
> client_identifier=query_test/test_parquet_stats.py::TestParquetStats::()::test_page_index[mt_dop:1|protocol:beeswax|exec_option:{&apos;batch_size&apos;:0;&apos;num_nodes&apos;:0;&apos;disable_codegen_rows_threshold&apos;:0;&apos;disable_codegen&apos;:False;&apos;abort_on_error&apos;:1;&apos;exec_single_node_rows_threshold&apos;:0}|;
> SET sync_ddl=False;
> -- executing against localhost:21000  {code}
> Impalad logs:
> {code:java}
> F1128 16:19:37.713317 18815 parquet-column-readers.cc:1286] 
> cc42e37e061881ae:6128a31000000002] Check failed: num_buffered_values_ >= 
> num_rows (18848 vs. 106048) {code}
>  
> Related stack trace:
> {code:java}
> #2  0x00000000056d9124 in google::DumpStackTraceAndExit() ()
> #3  0x00000000056ce55d in google::LogMessage::Fail() ()
> #4  0x00000000056cfe0d in google::LogMessage::SendToLog() ()
> #5  0x00000000056cdebb in google::LogMessage::Flush() ()
> #6  0x00000000056d1a79 in google::LogMessageFatal::~LogMessageFatal() ()
> #7  0x00000000031eedcd in 
> impala::BaseScalarColumnReader::SkipTopLevelRows<false> (this=0x1bb14400, 
> num_rows=106048, remaining=0x7f3e8a32a318) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/be/src/exec/parquet/parquet-column-readers.cc:1286
> #8  0x00000000031ed11c in 
> impala::BaseScalarColumnReader::SkipRowsInternal<false> (this=0x1bb14400, 
> num_rows=18848, skip_row_id=959199) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/be/src/exec/parquet/parquet-column-readers.cc:1548
> #9  0x00000000031ea414 in impala::BaseScalarColumnReader::SkipRows 
> (this=0x1bb14400, num_rows=18848, skip_row_id=959199) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/be/src/exec/parquet/parquet-column-readers.h:527
> #10 0x0000000003160f68 in impala::HdfsParquetScanner::SkipRowsForColumns 
> (this=0x181cfc00, column_readers=..., num_rows_to_skip=0x7f3e8a32a5c8, 
> skip_to_row=0x7f3e8a32a5c0) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:2341
> #11 0x000000000316d8cb in impala::HdfsParquetScanner::AssembleRows<true> 
> (this=0x181cfc00, row_batch=0x188b0270, skip_row_group=0x181cfdd0) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:2320
> #12 0x0000000003152bf9 in impala::HdfsParquetScanner::GetNextInternal 
> (this=0x181cfc00, row_batch=0x188b0270) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:537
> #13 0x0000000003150b7e in impala::HdfsParquetScanner::ProcessSplit 
> (this=0x181cfc00) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:427
> #14 0x0000000002d1d099 in impala::HdfsScanNode::ProcessSplit 
> (this=0x139e7000, filter_ctxs=..., expr_results_pool=0x7f3e8a32b400, 
> scan_range=0x1a990000, scanner_thread_reservation=0x7f3e8a32b328) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:500
> #15 0x0000000002d1c41a in impala::HdfsScanNode::ScannerThread 
> (this=0x139e7000, first_thread=true, scanner_thread_reservation=41943040) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:418
> #16 0x0000000002d1b782 in impala::HdfsScanNode::<lambda()>::operator()(void) 
> const (__closure=0x7f3e8a32bb28) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:339
> #17 0x0000000002d1e3a4 in 
> boost::detail::function::void_function_obj_invoker0<impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::<lambda()>,
>  void>::invoke(boost::detail::function::function_buffer &) 
> (function_obj_ptr=...) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.74.0-p1/include/boost/function/function_template.hpp:158
> #18 0x00000000022ab70e in boost::function0<void>::operator() 
> (this=0x7f3e8a32bb20) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.74.0-p1/include/boost/function/function_template.hpp:763
> #19 0x0000000002a77241 in 
> impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=..., 
> functor=..., parent_thread_info=0x7f3e800f87c0, 
> thread_started=0x7f3e800f75c0) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/be/src/util/thread.cc:360
> #20 0x0000000002a7fb91 in 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> 
> >::operator()<void (*)(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*), 
> boost::_bi::list0>(boost::_bi::type<void>, void 
> (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), boost::_bi::list0&, int) (this=0x160d4840, 
> f=@0x160d4838: 0x2a76efe 
> <impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*)>, a=...) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.74.0-p1/include/boost/bind/bind.hpp:531
> #21 0x0000000002a7fab5 in boost::_bi::bind_t<void, void 
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > 
> >::operator()() (this=0x160d4838) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.74.0-p1/include/boost/bind/bind.hpp:1294
> #22 0x0000000002a7fa76 in boost::detail::thread_data<boost::_bi::bind_t<void, 
> void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > 
> >::run() (this=0x160d4700) at 
> /data/jenkins/workspace/impala-asf-master-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.74.0-p1/include/boost/thread/detail/thread.hpp:120
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to