Csaba Ringhofer created IMPALA-14733:
----------------------------------------

             Summary: Hit DCHECK with non-determistic non-cluster partitioned 
insert
                 Key: IMPALA-14733
                 URL: https://issues.apache.org/jira/browse/IMPALA-14733
             Project: IMPALA
          Issue Type: Bug
          Components: Backend, Frontend
            Reporter: Csaba Ringhofer


create /*+noclustered,noshuffle */  table manyparts partitioned by (p) as 
select l_orderkey, cast(rand() * 1000 as int) p from tpch_parquet.lineitem;

{code}
F20260210 13:30:22.961767 2770839 dml-exec-state.cc:513] 
ae4e877e7795145f:dfc2886100000000] Check failed: 
per_partition_status_.find(name) == per_partition_status_.end() Partition 
status of p=355 already exists
    @          0x17d7f7e  impala::DmlExecState::AddPartition()
    @          0x1e5e142  impala::HdfsTableSink::GetOutputPartition()
    @          0x1e59631  impala::HdfsTableSink::Send()
{code}

The issue doesn't come if the noclustered hint is removed.
The clustered case does a sort, which materializes non-deterministic 
expressions, making them deterministic from that point. My assumption is that 
without noclustered the non-determinism is still there when writing the table, 
leading to unexpected behavior.

The fix would be to detect non-deterministic expressions during planning and 
either disallow the query or force materialization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to