Csaba Ringhofer created IMPALA-14733:
----------------------------------------
Summary: Hit DCHECK with non-determistic non-cluster partitioned
insert
Key: IMPALA-14733
URL: https://issues.apache.org/jira/browse/IMPALA-14733
Project: IMPALA
Issue Type: Bug
Components: Backend, Frontend
Reporter: Csaba Ringhofer
create /*+noclustered,noshuffle */ table manyparts partitioned by (p) as
select l_orderkey, cast(rand() * 1000 as int) p from tpch_parquet.lineitem;
{code}
F20260210 13:30:22.961767 2770839 dml-exec-state.cc:513]
ae4e877e7795145f:dfc2886100000000] Check failed:
per_partition_status_.find(name) == per_partition_status_.end() Partition
status of p=355 already exists
@ 0x17d7f7e impala::DmlExecState::AddPartition()
@ 0x1e5e142 impala::HdfsTableSink::GetOutputPartition()
@ 0x1e59631 impala::HdfsTableSink::Send()
{code}
The issue doesn't come if the noclustered hint is removed.
The clustered case does a sort, which materializes non-deterministic
expressions, making them deterministic from that point. My assumption is that
without noclustered the non-determinism is still there when writing the table,
leading to unexpected behavior.
The fix would be to detect non-deterministic expressions during planning and
either disallow the query or force materialization.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]