I just commented on #2303. I think we should get that fixed fairly soon -- at least an interim fix to ensure that compaction correctly catches the problem and fails. The plan for the long-term fix looks good to me as well.
On Mon, May 17, 2021 at 7:17 PM OpenInx <open...@gmail.com> wrote: > The PR-2303 defines how the batch job does the compaction work, the > PR-2308 decides what's the behavior that compaction txn and row-delta txn > commit at the same time. They should n't block each other, but we will > need to resolve both of them. > > On Tue, May 18, 2021 at 9:36 AM Huadong Liu <huadong...@gmail.com> wrote: > >> Thanks. Compaction is https://github.com/apache/iceberg/pull/2303 and it >> is currently blocked by https://github.com/apache/iceberg/issues/2308? >> >> On Mon, May 17, 2021 at 6:17 PM OpenInx <open...@gmail.com> wrote: >> >>> Hi Huadong >>> >>> From the perspective of iceberg developers, we don't expose the format >>> v2 to end users because we think there is still other work that needs to be >>> done. As you can see there are still some unfinished issues from your link. >>> As for whether v2 will cause data loss, from my perspective as a >>> designer, semantics and correctness should be handled very rigorously if we >>> don't do any compaction. Once we introduce the compaction action, we will >>> encounter this issue: https://github.com/apache/iceberg/issues/2308, >>> we've proposed a solution but still not reached an agreement in the >>> community. I will suggest using v2 in production after we resolve this >>> issue at least. >>> >>> On Sat, May 15, 2021 at 8:01 AM Huadong Liu <huadong...@gmail.com> >>> wrote: >>> >>>> Hi iceberg-dev, >>>> >>>> I tried v2 row-level deletion by committing equality delete files after >>>> *upgradeToFormatVersion(2)*. It worked well. I know that Spark actions >>>> to compact delete files and data files >>>> <https://github.com/apache/iceberg/milestone/4> etc. are in progress. >>>> I currently use the JAVA API to update, query and do maintenance ops. I am >>>> not using Flink at the moment and I will definitely pick up Spark actions >>>> when they are completed. Deletions can be scheduled in batches (e.g. >>>> weekly) to control the volume of delete files. I want to get a sense of the >>>> risk level of losing data at some point because of v2 Spec/API changes if I >>>> start to use v2 format now. It is not an easy question. Any input is >>>> appreciated. >>>> >>>> -- >>>> Huadong >>>> >>> -- Ryan Blue