[jira] [Commented] (IMPALA-12995) Don't always sort data for Iceberg partitioned inserts

David Rorke (Jira) Thu, 11 Apr 2024 10:41:04 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836288#comment-17836288
 ]


David Rorke commented on IMPALA-12995:
--------------------------------------

A couple related ideas we could investigate to allow us to buffer/write a 
limited number of files at a time during partitioned inserts without requiring 
the SORT:
 * Use hash-based bucketing to roughly cluster based on partition keys and 
write out buckets as we reach memory/buffering thresholds.  If partition counts 
are reasonably low you could accumulate enough data for a given set of keys to 
write out large parquet row groups before running out of buffer space - if not 
we might still need to do some spilling to avoid writing small files but still 
likely to be faster than full sort.
 * We could do partition based scanning from the outset for queries that are 
ultimately going to do partitioned inserts, ensuring a given partition is 
always scanned by a single thread and also processing one partition at a time 
within a given scanner thread.  So the partition keys would be clustered from 
the outset, and if we could avoid disrupting the ordering during any 
intermediate processing the rows would arrive at the writer properly ordered 
without requiring additional sorting.  It's not clear how much the ordered 
scans would impact scan performance but seems likely it would be less expensive 
than the current sort.

> Don't always sort data for Iceberg partitioned inserts
> ------------------------------------------------------
>
>                 Key: IMPALA-12995
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12995
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>
> Currently we always do a SHUFFLE and SORT before inserting data into a 
> partitioned Iceberg table.
> With SHUFFLE, we can guarantee that each partition is assigned to a single 
> sink operator, hence we can minimize the number of files being created.
> With SORT, we can write partitions one after the other, therefore we only 
> need to write one file at a time (and only buffer data for that one file). 
> This way we can avoid out of memory situations in the sink.
> SORT does a total ordering of the incoming records, therefore it can be very 
> expensive. And when SORT needs to spill to disk, its fragment cannot receive 
> incoming records, blocking the execution of the whole query cluster-wide 
> (because after some time every sender is blocked on the receiving SORT 
> fragment).
> In a lot of cases the SORT is not necessary because there's only a very few 
> partitions assigned to each sink, especially in large clusters with high 
> MT_DOP.
> During planning Impala should decide whether to add the SORT node for such 
> partitioned INSERTs. Also, it should respect the optimizer hints 
> CLUSTERED/NOCLUSTERED.
> https://impala.apache.org/docs/build/html/topics/impala_hints.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-12995) Don't always sort data for Iceberg partitioned inserts

Reply via email to