[
https://issues.apache.org/jira/browse/IMPALA-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215276#comment-17215276
]
ASF subversion and git services commented on IMPALA-10233:
----------------------------------------------------------
Commit faa2d398e647c0347f81af08d7e4fa4d4a7acf72 in impala's branch
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=faa2d39 ]
IMPALA-10233: zorder sort node should output rows in lexical order of partition
keys
When inserting to a partitioned hdfs table, the planner will add a sort
node on top of the plan, depending on the clustered/noclustered plan
hint and on the 'sort.columns' table property. If clustering is enabled
in insertStmt or additional columns are specified in the 'sort.columns'
table property, then the ordering columns will start with the clustering
columns, so that partitions can be written sequentially in the table
sink. Any additional non-clustering columns specified by the
'sort.columns' property will be added to the ordering columns and after
any clustering columns.
For Z-order sort type, we should deal with these ordering columns
separately. The clustering columns should still be sorted lexically, and
only the remaining ordering columns be sorted in Z-order. So we can
still insert partitions one by one and avoid hitting the DCHECK as
described in the JIRA.
Tests
- Add tests for inserting to a partitioned table with zorder.
Change-Id: I30cbad711167b8b63c81837e497b36fd41be9b54
Reviewed-on: http://gerrit.cloudera.org:8080/16590
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Hit DCHECK in DmlExecState::AddPartition when inserting to a partitioned
> table with zorder
> ------------------------------------------------------------------------------------------
>
> Key: IMPALA-10233
> URL: https://issues.apache.org/jira/browse/IMPALA-10233
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Blocker
> Labels: crash
>
> Hit the DCHECK when inserting to a partitioned parquet table with zorder. I'm
> on master branch (commit=b8a2b75).
> {code:java}
> F1012 15:04:27.726274 3868 dml-exec-state.cc:432]
> a6479cc4725101fd:b86db2a100000003] Check failed:
> per_partition_status_.find(name) == per_partition_status_.end()
> *** Check failure stack trace: ***
> @ 0x51ff3cc google::LogMessage::Fail()
> @ 0x5200cbc google::LogMessage::SendToLog()
> @ 0x51fed2a google::LogMessage::Flush()
> @ 0x5202928 google::LogMessageFatal::~LogMessageFatal()
> @ 0x234ba18 impala::DmlExecState::AddPartition()
> @ 0x2817786 impala::HdfsTableSink::GetOutputPartition()
> @ 0x2813151 impala::HdfsTableSink::WriteClusteredRowBatch()
> @ 0x28156c4 impala::HdfsTableSink::Send()
> @ 0x23139dd impala::FragmentInstanceState::ExecInternal()
> @ 0x230fe10 impala::FragmentInstanceState::Exec()
> @ 0x227bb79 impala::QueryState::ExecFInstance()
> @ 0x2279f7b
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
> @ 0x227e2c2
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
> @ 0x2137699 boost::function0<>::operator()()
> @ 0x2715d7d impala::Thread::SuperviseThread()
> @ 0x271dd1a boost::_bi::list5<>::operator()<>()
> @ 0x271dc3e boost::_bi::bind_t<>::operator()()
> @ 0x271dbff boost::detail::thread_data<>::run()
> @ 0x3f05f01 thread_proxy
> @ 0x7fb18bebb6b9 start_thread
> @ 0x7fb188a474dc clone {code}
> It seems the zorder sort node doesn't keep the rows sorted by partition keys.
> Thus violates the assumption of HdfsTableSink::WriteClusteredRowBatch() that
> input must be ordered by the partition key expressions. So a partition key
> was deleted and then inserted again to the
> {{partition_keys_to_output_partitions_}} map.
> {code:c++}
> /// Maps all rows in 'batch' to partitions and appends them to their
> temporary Hdfs
> /// files. The input must be ordered by the partition key expressions.
> Status WriteClusteredRowBatch(RuntimeState* state, RowBatch* batch)
> WARN_UNUSED_RESULT;
> {code}
> The key got removed here:
> https://github.com/apache/impala/blob/b8a2b754669eb7f8d164e8112e594ac413e436ef/be/src/exec/hdfs-table-sink.cc#L334
> when processing a new partition key.
> It got reinserted here:
> https://github.com/apache/impala/blob/b8a2b754669eb7f8d164e8112e594ac413e436ef/be/src/exec/hdfs-table-sink.cc#L590
> so hit the DCHECK.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]