I believe the conclusion here was that there is already a catalog level property with the purpose of adding table defaults. This could be used to make the default table format to v2 on a particular catalog. See my last email on this thread. One thing I haven't checked is if this property works for all the catalog types or just a subset of them. But I think it's worth a try to see if it works in your environment. It's "table.default.<TABLE_PARAM>" setting
On Mon, Mar 20, 2023 at 5:41 AM Manu Zhang <[email protected]> wrote: > Is there any progress to make default format version a catalog property? > > Thanks, > Manu > > On Wed, Jan 18, 2023 at 5:43 PM Gabor Kaszab > <[email protected]> wrote: > >> I also ran into this "table-default." setting >> <https://github.com/apache/iceberg/blob/35151fe17b47c0af22787db4e4964b0cfcfdb215/core/src/main/java/org/apache/iceberg/CatalogProperties.java#L30> >> prefix. For me it seems that it's a catalog level config so it's enough to >> provide e.g. "table-default.format-version" = "2" to each catalog as a >> startup flag. For me it seems that catalogs derived from >> BaseMetastoreCatalog use this table default prefix >> <https://github.com/apache/iceberg/blob/35151fe17b47c0af22787db4e4964b0cfcfdb215/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L148> >> . >> >> Gabor >> >> On Wed, Jan 18, 2023 at 12:00 AM Yufei Gu <[email protected]> wrote: >> >>> The functionality has been there if we are talking about setting the >>> default format at the Iceberg catalog. For example, we can set a catalog >>> like this. All tables created will be v2 tables. >>> spark.sql.catalog.hive_prod.table-default.format-version = "2" >>> >>> Of course, we need to set it for each Spark App. Setting Trino would be >>> easier. It would be one catalog level change. >>> >>> Best, >>> >>> Yufei >>> >>> `This is not a contribution` >>> >>> >>> On Mon, Jan 16, 2023 at 1:34 AM Gabor Kaszab >>> <[email protected]> wrote: >>> >>>> It seems we have a consensus on the approach. I can take a look at >>>> implementing this if no one has any objections. >>>> >>>> Gabor >>>> >>>> On Fri, Jan 13, 2023 at 11:28 PM Ryan Blue <[email protected]> wrote: >>>> >>>>> That sounds like a good idea to me. >>>>> >>>>> On Fri, Jan 13, 2023 at 11:04 AM Jack Ye <[email protected]> wrote: >>>>> >>>>>> > I think the issue is that all of the built-in catalogs currently >>>>>> call the version of `newTableMetadata` that defaults to v1. >>>>>> >>>>>> Yes I think this seems like the key issue for the catalogs that >>>>>> extend BaseMetastoreCatalog. Looks like we should make changes to make >>>>>> the >>>>>> default format version a catalog property, instead of hard-coded in >>>>>> TableMetadata? >>>>>> >>>>>> -Jack >>>>>> >>>>>> On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Gabor, >>>>>>> >>>>>>> It makes sense to me. AFAIK, as the tables creation comes from >>>>>>> catalog >>>>>>> "controller", they can "decide" the version. So, it would be each >>>>>>> catalog to deal with the way/version they want to create tables. >>>>>>> >>>>>>> Regards >>>>>>> JB >>>>>>> >>>>>>> On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab < >>>>>>> [email protected]> wrote: >>>>>>> > >>>>>>> > Naively asking, can't we add some property to tell Iceberg which >>>>>>> version to use as default when creating tables? (If there is no such >>>>>>> setting currently) >>>>>>> > >>>>>>> > Gabor >>>>>>> > >>>>>>> > Jack Ye <[email protected]> ezt írta (időpont: 2023. jan. 11., >>>>>>> Sze 20:04): >>>>>>> >> >>>>>>> >> Should we start a community vote on this? >>>>>>> >> >>>>>>> >> I remember in today's community sync meeting Russell briefly >>>>>>> discussed about some compaction supports that are not there yet and some >>>>>>> users are struggled with small delete files issue, and it was to some >>>>>>> extent why Spark is still defaulting v1. >>>>>>> >> >>>>>>> >> Regarding feature side, changelog scan is mostly there in Spark, >>>>>>> and there will also likely be movements on Trino side for it very soon. >>>>>>> >> >>>>>>> >> Overall, I think it would be beneficial to move default to v2, >>>>>>> which could incentivize the completion of those missing parts across >>>>>>> engines. >>>>>>> >> >>>>>>> >> Best, >>>>>>> >> Jack Ye >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen < >>>>>>> [email protected]> wrote: >>>>>>> >>> >>>>>>> >>> Hi, >>>>>>> >>> >>>>>>> >>> FWIW Trino already creates v2 tables by default. >>>>>>> >>> Thought it's worth sharing for context. >>>>>>> >>> >>>>>>> >>> Best >>>>>>> >>> PF >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang < >>>>>>> [email protected]> wrote: >>>>>>> >>>> >>>>>>> >>>> Hi all, >>>>>>> >>>> >>>>>>> >>>> We've maintained a forked Iceberg internally and all our use >>>>>>> cases involve v2 tables with row-level updates and deletes. Our users >>>>>>> need >>>>>>> to remember to create table with the `'format-version'='2'` option or >>>>>>> alter >>>>>>> table afterwards. >>>>>>> >>>> >>>>>>> >>>> I'm thinking about changing the default format-version of our >>>>>>> forked Iceberg to v2 . Is there any concern for this change? Any hidden >>>>>>> issues I've missed? >>>>>>> >>>> >>>>>>> >>>> Thanks, >>>>>>> >>>> Manu >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Tabular >>>>> >>>>
