Is there any progress to make default format version a catalog property? Thanks, Manu
On Wed, Jan 18, 2023 at 5:43 PM Gabor Kaszab <[email protected]> wrote: > I also ran into this "table-default." setting > <https://github.com/apache/iceberg/blob/35151fe17b47c0af22787db4e4964b0cfcfdb215/core/src/main/java/org/apache/iceberg/CatalogProperties.java#L30> > prefix. For me it seems that it's a catalog level config so it's enough to > provide e.g. "table-default.format-version" = "2" to each catalog as a > startup flag. For me it seems that catalogs derived from > BaseMetastoreCatalog use this table default prefix > <https://github.com/apache/iceberg/blob/35151fe17b47c0af22787db4e4964b0cfcfdb215/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L148> > . > > Gabor > > On Wed, Jan 18, 2023 at 12:00 AM Yufei Gu <[email protected]> wrote: > >> The functionality has been there if we are talking about setting the >> default format at the Iceberg catalog. For example, we can set a catalog >> like this. All tables created will be v2 tables. >> spark.sql.catalog.hive_prod.table-default.format-version = "2" >> >> Of course, we need to set it for each Spark App. Setting Trino would be >> easier. It would be one catalog level change. >> >> Best, >> >> Yufei >> >> `This is not a contribution` >> >> >> On Mon, Jan 16, 2023 at 1:34 AM Gabor Kaszab >> <[email protected]> wrote: >> >>> It seems we have a consensus on the approach. I can take a look at >>> implementing this if no one has any objections. >>> >>> Gabor >>> >>> On Fri, Jan 13, 2023 at 11:28 PM Ryan Blue <[email protected]> wrote: >>> >>>> That sounds like a good idea to me. >>>> >>>> On Fri, Jan 13, 2023 at 11:04 AM Jack Ye <[email protected]> wrote: >>>> >>>>> > I think the issue is that all of the built-in catalogs currently >>>>> call the version of `newTableMetadata` that defaults to v1. >>>>> >>>>> Yes I think this seems like the key issue for the catalogs that extend >>>>> BaseMetastoreCatalog. Looks like we should make changes to make the >>>>> default >>>>> format version a catalog property, instead of hard-coded in TableMetadata? >>>>> >>>>> -Jack >>>>> >>>>> On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Gabor, >>>>>> >>>>>> It makes sense to me. AFAIK, as the tables creation comes from catalog >>>>>> "controller", they can "decide" the version. So, it would be each >>>>>> catalog to deal with the way/version they want to create tables. >>>>>> >>>>>> Regards >>>>>> JB >>>>>> >>>>>> On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab <[email protected]> >>>>>> wrote: >>>>>> > >>>>>> > Naively asking, can't we add some property to tell Iceberg which >>>>>> version to use as default when creating tables? (If there is no such >>>>>> setting currently) >>>>>> > >>>>>> > Gabor >>>>>> > >>>>>> > Jack Ye <[email protected]> ezt írta (időpont: 2023. jan. 11., >>>>>> Sze 20:04): >>>>>> >> >>>>>> >> Should we start a community vote on this? >>>>>> >> >>>>>> >> I remember in today's community sync meeting Russell briefly >>>>>> discussed about some compaction supports that are not there yet and some >>>>>> users are struggled with small delete files issue, and it was to some >>>>>> extent why Spark is still defaulting v1. >>>>>> >> >>>>>> >> Regarding feature side, changelog scan is mostly there in Spark, >>>>>> and there will also likely be movements on Trino side for it very soon. >>>>>> >> >>>>>> >> Overall, I think it would be beneficial to move default to v2, >>>>>> which could incentivize the completion of those missing parts across >>>>>> engines. >>>>>> >> >>>>>> >> Best, >>>>>> >> Jack Ye >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen < >>>>>> [email protected]> wrote: >>>>>> >>> >>>>>> >>> Hi, >>>>>> >>> >>>>>> >>> FWIW Trino already creates v2 tables by default. >>>>>> >>> Thought it's worth sharing for context. >>>>>> >>> >>>>>> >>> Best >>>>>> >>> PF >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang < >>>>>> [email protected]> wrote: >>>>>> >>>> >>>>>> >>>> Hi all, >>>>>> >>>> >>>>>> >>>> We've maintained a forked Iceberg internally and all our use >>>>>> cases involve v2 tables with row-level updates and deletes. Our users >>>>>> need >>>>>> to remember to create table with the `'format-version'='2'` option or >>>>>> alter >>>>>> table afterwards. >>>>>> >>>> >>>>>> >>>> I'm thinking about changing the default format-version of our >>>>>> forked Iceberg to v2 . Is there any concern for this change? Any hidden >>>>>> issues I've missed? >>>>>> >>>> >>>>>> >>>> Thanks, >>>>>> >>>> Manu >>>>>> >>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Tabular >>>> >>>
