[jira] [Commented] (IMPALA-10233) Hit DCHECK in DmlExecState::AddPartition when inserting to a partitioned table with zorder

ASF subversion and git services (Jira) Fri, 16 Oct 2020 02:03:25 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215276#comment-17215276
 ]


ASF subversion and git services commented on IMPALA-10233:
----------------------------------------------------------

Commit faa2d398e647c0347f81af08d7e4fa4d4a7acf72 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=faa2d39 ]

IMPALA-10233: zorder sort node should output rows in lexical order of partition 
keys

When inserting to a partitioned hdfs table, the planner will add a sort
node on top of the plan, depending on the clustered/noclustered plan
hint and on the 'sort.columns' table property. If clustering is enabled
in insertStmt or additional columns are specified in the 'sort.columns'
table property, then the ordering columns will start with the clustering
columns, so that partitions can be written sequentially in the table
sink. Any additional non-clustering columns specified by the
'sort.columns' property will be added to the ordering columns and after
any clustering columns.

For Z-order sort type, we should deal with these ordering columns
separately. The clustering columns should still be sorted lexically, and
only the remaining ordering columns be sorted in Z-order. So we can
still insert partitions one by one and avoid hitting the DCHECK as
described in the JIRA.

Tests
 - Add tests for inserting to a partitioned table with zorder.

Change-Id: I30cbad711167b8b63c81837e497b36fd41be9b54
Reviewed-on: http://gerrit.cloudera.org:8080/16590
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Hit DCHECK in DmlExecState::AddPartition when inserting to a partitioned 
> table with zorder
> ------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10233
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10233
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Blocker
>              Labels: crash
>
> Hit the DCHECK when inserting to a partitioned parquet table with zorder. I'm 
> on master branch (commit=b8a2b75).
> {code:java}
> F1012 15:04:27.726274  3868 dml-exec-state.cc:432] 
> a6479cc4725101fd:b86db2a100000003] Check failed: 
> per_partition_status_.find(name) == per_partition_status_.end() 
> *** Check failure stack trace: *** 
>     @          0x51ff3cc  google::LogMessage::Fail()
>     @          0x5200cbc  google::LogMessage::SendToLog()
>     @          0x51fed2a  google::LogMessage::Flush()
>     @          0x5202928  google::LogMessageFatal::~LogMessageFatal()
>     @          0x234ba18  impala::DmlExecState::AddPartition()
>     @          0x2817786  impala::HdfsTableSink::GetOutputPartition()
>     @          0x2813151  impala::HdfsTableSink::WriteClusteredRowBatch()
>     @          0x28156c4  impala::HdfsTableSink::Send()
>     @          0x23139dd  impala::FragmentInstanceState::ExecInternal()
>     @          0x230fe10  impala::FragmentInstanceState::Exec()
>     @          0x227bb79  impala::QueryState::ExecFInstance()
>     @          0x2279f7b  
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
>     @          0x227e2c2  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
>     @          0x2137699  boost::function0<>::operator()()
>     @          0x2715d7d  impala::Thread::SuperviseThread()
>     @          0x271dd1a  boost::_bi::list5<>::operator()<>()
>     @          0x271dc3e  boost::_bi::bind_t<>::operator()()
>     @          0x271dbff  boost::detail::thread_data<>::run()
>     @          0x3f05f01  thread_proxy
>     @     0x7fb18bebb6b9  start_thread
>     @     0x7fb188a474dc  clone {code}
> It seems the zorder sort node doesn't keep the rows sorted by partition keys. 
> Thus violates the assumption of HdfsTableSink::WriteClusteredRowBatch() that 
> input must be ordered by the partition key expressions. So a partition key 
> was deleted and then inserted again to the 
> {{partition_keys_to_output_partitions_}} map.
> {code:c++}
>   /// Maps all rows in 'batch' to partitions and appends them to their 
> temporary Hdfs
>   /// files. The input must be ordered by the partition key expressions.
>   Status WriteClusteredRowBatch(RuntimeState* state, RowBatch* batch) 
> WARN_UNUSED_RESULT;
> {code}
> The key got removed here: 
> https://github.com/apache/impala/blob/b8a2b754669eb7f8d164e8112e594ac413e436ef/be/src/exec/hdfs-table-sink.cc#L334
>  when processing a new partition key.
> It got reinserted here: 
> https://github.com/apache/impala/blob/b8a2b754669eb7f8d164e8112e594ac413e436ef/be/src/exec/hdfs-table-sink.cc#L590
>  so hit the DCHECK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-10233) Hit DCHECK in DmlExecState::AddPartition when inserting to a partitioned table with zorder

Reply via email to