Thanks Gabor, I realized it's already done after sending out the last reply. The setting is actually "table-default.<TABLE_PARAM>". In case someone else needs a back-port as well, the related PR is https://github.com/apache/iceberg/pull/4011
Regards, Manu On Mon, Mar 20, 2023 at 6:09 PM Gabor Kaszab <[email protected]> wrote: > I believe the conclusion here was that there is already a catalog level > property with the purpose of adding table defaults. This could be used to > make the default table format to v2 on a particular catalog. See my last > email on this thread. One thing I haven't checked is if this property works > for all the catalog types or just a subset of them. But I think it's worth > a try to see if it works in your environment. > It's "table.default.<TABLE_PARAM>" setting > > On Mon, Mar 20, 2023 at 5:41 AM Manu Zhang <[email protected]> > wrote: > >> Is there any progress to make default format version a catalog property? >> >> Thanks, >> Manu >> >> On Wed, Jan 18, 2023 at 5:43 PM Gabor Kaszab >> <[email protected]> wrote: >> >>> I also ran into this "table-default." setting >>> <https://github.com/apache/iceberg/blob/35151fe17b47c0af22787db4e4964b0cfcfdb215/core/src/main/java/org/apache/iceberg/CatalogProperties.java#L30> >>> prefix. For me it seems that it's a catalog level config so it's enough to >>> provide e.g. "table-default.format-version" = "2" to each catalog as a >>> startup flag. For me it seems that catalogs derived from >>> BaseMetastoreCatalog use this table default prefix >>> <https://github.com/apache/iceberg/blob/35151fe17b47c0af22787db4e4964b0cfcfdb215/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L148> >>> . >>> >>> Gabor >>> >>> On Wed, Jan 18, 2023 at 12:00 AM Yufei Gu <[email protected]> wrote: >>> >>>> The functionality has been there if we are talking about setting the >>>> default format at the Iceberg catalog. For example, we can set a catalog >>>> like this. All tables created will be v2 tables. >>>> spark.sql.catalog.hive_prod.table-default.format-version = "2" >>>> >>>> Of course, we need to set it for each Spark App. Setting Trino would be >>>> easier. It would be one catalog level change. >>>> >>>> Best, >>>> >>>> Yufei >>>> >>>> `This is not a contribution` >>>> >>>> >>>> On Mon, Jan 16, 2023 at 1:34 AM Gabor Kaszab >>>> <[email protected]> wrote: >>>> >>>>> It seems we have a consensus on the approach. I can take a look at >>>>> implementing this if no one has any objections. >>>>> >>>>> Gabor >>>>> >>>>> On Fri, Jan 13, 2023 at 11:28 PM Ryan Blue <[email protected]> wrote: >>>>> >>>>>> That sounds like a good idea to me. >>>>>> >>>>>> On Fri, Jan 13, 2023 at 11:04 AM Jack Ye <[email protected]> wrote: >>>>>> >>>>>>> > I think the issue is that all of the built-in catalogs currently >>>>>>> call the version of `newTableMetadata` that defaults to v1. >>>>>>> >>>>>>> Yes I think this seems like the key issue for the catalogs that >>>>>>> extend BaseMetastoreCatalog. Looks like we should make changes to make >>>>>>> the >>>>>>> default format version a catalog property, instead of hard-coded in >>>>>>> TableMetadata? >>>>>>> >>>>>>> -Jack >>>>>>> >>>>>>> On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Gabor, >>>>>>>> >>>>>>>> It makes sense to me. AFAIK, as the tables creation comes from >>>>>>>> catalog >>>>>>>> "controller", they can "decide" the version. So, it would be each >>>>>>>> catalog to deal with the way/version they want to create tables. >>>>>>>> >>>>>>>> Regards >>>>>>>> JB >>>>>>>> >>>>>>>> On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab < >>>>>>>> [email protected]> wrote: >>>>>>>> > >>>>>>>> > Naively asking, can't we add some property to tell Iceberg which >>>>>>>> version to use as default when creating tables? (If there is no such >>>>>>>> setting currently) >>>>>>>> > >>>>>>>> > Gabor >>>>>>>> > >>>>>>>> > Jack Ye <[email protected]> ezt írta (időpont: 2023. jan. 11., >>>>>>>> Sze 20:04): >>>>>>>> >> >>>>>>>> >> Should we start a community vote on this? >>>>>>>> >> >>>>>>>> >> I remember in today's community sync meeting Russell briefly >>>>>>>> discussed about some compaction supports that are not there yet and >>>>>>>> some >>>>>>>> users are struggled with small delete files issue, and it was to some >>>>>>>> extent why Spark is still defaulting v1. >>>>>>>> >> >>>>>>>> >> Regarding feature side, changelog scan is mostly there in Spark, >>>>>>>> and there will also likely be movements on Trino side for it very soon. >>>>>>>> >> >>>>>>>> >> Overall, I think it would be beneficial to move default to v2, >>>>>>>> which could incentivize the completion of those missing parts across >>>>>>>> engines. >>>>>>>> >> >>>>>>>> >> Best, >>>>>>>> >> Jack Ye >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen < >>>>>>>> [email protected]> wrote: >>>>>>>> >>> >>>>>>>> >>> Hi, >>>>>>>> >>> >>>>>>>> >>> FWIW Trino already creates v2 tables by default. >>>>>>>> >>> Thought it's worth sharing for context. >>>>>>>> >>> >>>>>>>> >>> Best >>>>>>>> >>> PF >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>> >>>>>>>> >>>> Hi all, >>>>>>>> >>>> >>>>>>>> >>>> We've maintained a forked Iceberg internally and all our use >>>>>>>> cases involve v2 tables with row-level updates and deletes. Our users >>>>>>>> need >>>>>>>> to remember to create table with the `'format-version'='2'` option or >>>>>>>> alter >>>>>>>> table afterwards. >>>>>>>> >>>> >>>>>>>> >>>> I'm thinking about changing the default format-version of our >>>>>>>> forked Iceberg to v2 . Is there any concern for this change? Any hidden >>>>>>>> issues I've missed? >>>>>>>> >>>> >>>>>>>> >>>> Thanks, >>>>>>>> >>>> Manu >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Tabular >>>>>> >>>>>
