[ 
https://issues.apache.org/jira/browse/IMPALA-11331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Mate closed IMPALA-11331.
-------------------------------
    Resolution: Won't Fix

With IMPALA-11377 fixed this can be closed.

> Create Iceberg transactions earlier
> -----------------------------------
>
>                 Key: IMPALA-11331
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11331
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Tamas Mate
>            Priority: Major
>              Labels: impala-iceberg
>
> Currently we create Iceberg transactions via 
> IcebergUtil.getIcebergTransaction() in CatalogOpExecutor.
> This is problematic in some cases, especially for INSERT OVERWRITEs, because 
> in this case we open the transaction too late, when the data files are 
> already written, then we open the transaction and commit it. INSERT 
> statements in the meantime get overwritten instead of failing the INSERT 
> OVERWRITE operation.
> This can be problematic when we try to use INSERT OVERWRITE for compacting a 
> table. In that case we definitely don't want to loose INSERTed data.
> Moving transaction open/close to the coordinator requires a lot of work, and 
> the handling of self-events would become even more complicated.
> Alternatively, we could initiate an open transaction from the Coordinator, 
> i.e. asking CatalogD to open one, then at the end CatalogD would commit the 
> opened transaction.
> We also need to abort transactions of failed queries. We also need ways of 
> aborting transactions of crashed Coordinators.
>  
> UPDATE: Newer Iceberg releases will have an API to check for concurrent 
> writes: 
> [https://github.com/apache/iceberg/blob/9ab94f87de036c9cd91cf8353906a576b4a516ff/api/src/main/java/org/apache/iceberg/ReplacePartitions.java#L28-L34]
> Probably the most straightforward thing is to use this API. Save the current 
> snapshot ID at the coordinator during planning, then propagate this 
> information to CatalogD in TIcebergOperationParam.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to