The functionality has been there if we are talking about setting the default format at the Iceberg catalog. For example, we can set a catalog like this. All tables created will be v2 tables. spark.sql.catalog.hive_prod.table-default.format-version = "2"
Of course, we need to set it for each Spark App. Setting Trino would be easier. It would be one catalog level change. Best, Yufei `This is not a contribution` On Mon, Jan 16, 2023 at 1:34 AM Gabor Kaszab <[email protected]> wrote: > It seems we have a consensus on the approach. I can take a look at > implementing this if no one has any objections. > > Gabor > > On Fri, Jan 13, 2023 at 11:28 PM Ryan Blue <[email protected]> wrote: > >> That sounds like a good idea to me. >> >> On Fri, Jan 13, 2023 at 11:04 AM Jack Ye <[email protected]> wrote: >> >>> > I think the issue is that all of the built-in catalogs currently call >>> the version of `newTableMetadata` that defaults to v1. >>> >>> Yes I think this seems like the key issue for the catalogs that extend >>> BaseMetastoreCatalog. Looks like we should make changes to make the default >>> format version a catalog property, instead of hard-coded in TableMetadata? >>> >>> -Jack >>> >>> On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré <[email protected]> >>> wrote: >>> >>>> Hi Gabor, >>>> >>>> It makes sense to me. AFAIK, as the tables creation comes from catalog >>>> "controller", they can "decide" the version. So, it would be each >>>> catalog to deal with the way/version they want to create tables. >>>> >>>> Regards >>>> JB >>>> >>>> On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab <[email protected]> >>>> wrote: >>>> > >>>> > Naively asking, can't we add some property to tell Iceberg which >>>> version to use as default when creating tables? (If there is no such >>>> setting currently) >>>> > >>>> > Gabor >>>> > >>>> > Jack Ye <[email protected]> ezt írta (időpont: 2023. jan. 11., Sze >>>> 20:04): >>>> >> >>>> >> Should we start a community vote on this? >>>> >> >>>> >> I remember in today's community sync meeting Russell briefly >>>> discussed about some compaction supports that are not there yet and some >>>> users are struggled with small delete files issue, and it was to some >>>> extent why Spark is still defaulting v1. >>>> >> >>>> >> Regarding feature side, changelog scan is mostly there in Spark, and >>>> there will also likely be movements on Trino side for it very soon. >>>> >> >>>> >> Overall, I think it would be beneficial to move default to v2, which >>>> could incentivize the completion of those missing parts across engines. >>>> >> >>>> >> Best, >>>> >> Jack Ye >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen < >>>> [email protected]> wrote: >>>> >>> >>>> >>> Hi, >>>> >>> >>>> >>> FWIW Trino already creates v2 tables by default. >>>> >>> Thought it's worth sharing for context. >>>> >>> >>>> >>> Best >>>> >>> PF >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang < >>>> [email protected]> wrote: >>>> >>>> >>>> >>>> Hi all, >>>> >>>> >>>> >>>> We've maintained a forked Iceberg internally and all our use cases >>>> involve v2 tables with row-level updates and deletes. Our users need to >>>> remember to create table with the `'format-version'='2'` option or alter >>>> table afterwards. >>>> >>>> >>>> >>>> I'm thinking about changing the default format-version of our >>>> forked Iceberg to v2 . Is there any concern for this change? Any hidden >>>> issues I've missed? >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Manu >>>> >>> >> >> -- >> Ryan Blue >> Tabular >> >
