Hi Yufei Don't we risk a kind of inconsistency ? Let me take the example of compaction. Does it make sense to have two engines doing compaction with different properties/recipes ? It sounds very risky to me (even if Polaris "manages" the compaction orchestration). If the use case is to have, on one table, one maintenance engine doing compaction, another one doing snapshot GC, it makes sense. However, why not using the Iceberg table properties then ?
I'm still a bit confused :) That's why I think it would be great to have a new sync-up about that. Thanks ! Regards JB On Wed, Jan 15, 2025 at 3:23 AM Yufei Gu <flyrain...@gmail.com> wrote: > > Hi JB, > > Thanks for the review. As we discussed, engines can still use table > properties if that's preferred. > > In the case that a table is visited by multiple engines, these optional > properties become critical, allowing admins or SREs to specify specialized > “recipes” for table maintenance engines. > > Yufei > > > On Mon, Jan 13, 2025 at 6:10 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > Hi Folks, > > > > I did a new pass on the "Policy Management" proposal and I struggled > > to understand why the policy "overrides" the Iceberg table properties > > (even optional). > > In the proposal, I see this: > > > > { > > "type": "object", > > "properties": { > > "enable": { "type": "boolean" }, > > "target_file_size_bytes": { "type": "integer"}, > > "compaction_threshold": { "type": "number"} > > }, > > "required": ["enable"] > > } > > > > Table properties already contain write.target-file-size-bytes > > (https://iceberg.apache.org/docs/latest/configuration/#table-properties). > > Why not use this property ? > > > > Same comment for snapshots: > > > > { > > "type": "object", > > "properties": { > > "enable": { "type": "boolean" }, > > "min_snapshot_to_keep": { "type": "integer"}, > > "max_snapshot_age_days": { "type": "integer"} > > }, > > "required": ["enable"] > > } > > > > We have history.expire.max-snapshot-age-ms and > > history.expire.min-snapshots-to-keep table properties. > > > > I think it's very confusing, and in order to keep query engines > > interoperability, I would rather use the "standard" table properties. > > > > Can we clarify this ? > > > > Thanks ! > > Regards > > JB > > > > On Thu, Jan 2, 2025 at 8:30 PM Yufei Gu <flyrain...@gmail.com> wrote: > > > > > > Hi Folks, > > > > > > Happy New Year! I hope you all had a wonderful and refreshing break. > > > > > > Following our previous discussions, we have decided to use separated > > policy > > > entities(option 2) for table maintenance. I've outlined the detailed > > design > > > here, > > > > > https://docs.google.com/document/d/1kIiVkFFg9tPa5SH70b9WwzbmclrzH3qWHKfCKXw5lbs/edit?usp=sharing > > . > > > It is based on the previous design with a wider scope for policy > > management. > > > > > > I’d love to hear your thoughts, feedback, or suggestions, so feel free to > > > review and share your input. > > > > > > Yufei > > > > > > > > > On Tue, Dec 10, 2024 at 5:43 PM Yufei Gu <flyrain...@gmail.com> wrote: > > > > > > > Hi everyone, > > > > > > > > > > > > Thank you all for taking the time to meet! Here’s a summary of our > > > > discussion: > > > > > > > > 1. *Challenges with Storing Policies as Properties (Option 1):* > > > > - We identified scalability limitations for access control in > > this > > > > approach. > > > > 2. *Benefits of Using Separate Policy Entities (Option 2):* > > > > - This approach offers a more generic solution with improved > > access > > > > control and better performance. > > > > - This approach could apply to a variety of use cases, like > > column > > > > masking. > > > > - There are certain agreements on this approach. > > > > 3. *Other Options Considered:* > > > > - Storing policies as Polaris entity properties or using a 1:1 > > > > mapping of policy entities with catalog/namespace/table entities. > > > > - While slightly different from Option 1, these approaches still > > > > present notable drawbacks similar to option 1. > > > > 4. *Option to Delegate Policy Storage to TMS:* > > > > - We discussed the possibility of not storing any policies in > > > > Polaris, allowing TMS to manage all policies. > > > > - However, the proposed approach aims to promote interoperability > > > > across engines and systems like TMS, without preventing them > > from having > > > > their own rules or policies. > > > > > > > > > > > > Please let me know if I missed anything or if further clarifications > > are > > > > needed. > > > > > > > > > > > > > > > > Yufei > > > > > > > > > > > > On Wed, Dec 4, 2024 at 2:37 PM Omar Al-Safi <o...@oalsafi.com> wrote: > > > > > > > >> Thank you Yufei for the flexibility! > > > >> > > > >> Regards, > > > >> Omar > > > >> > > > >> On Wed, 4 Dec 2024, 23:12 Yufei Gu, <flyrain...@gmail.com> wrote: > > > >> > > > >> > I've rescheduled it to next Monday due to the availability. Sorry > > for > > > >> > any inconvenience. FYI, I will not record it as I don't have a > > > >> > premium google account yet. > > > >> > > > > >> > Table maintenance in Polaris > > > >> > Monday, December 9 · 9:00 – 10:00am > > > >> > Time zone: America/Los_Angeles > > > >> > Google Meet joining info > > > >> > Video call link: https://meet.google.com/dix-cdfm-pve > > > >> > > > > >> > Yufei > > > >> > > > > >> > > > > >> > On Wed, Dec 4, 2024 at 1:15 AM Omar Al-Safi <o...@oalsafi.com> > > wrote: > > > >> > > > > >> > > Thank you Yufei for getting this moving. > > > >> > > > > > >> > > Unfortunately tomorrow I won't be able to make it plus I think a > > > >> couple > > > >> > of > > > >> > > guys are at reinvent (JB for example), would it make sense to > > > >> reschedule > > > >> > to > > > >> > > early next week? Or maybe have it recorded. > > > >> > > As I highlighted in the document, I am feeling embedding the > > policies > > > >> > into > > > >> > > Polaris feels more of TMS concern rather than Polaris concern. > > > >> Unless, we > > > >> > > provide a way to have pluggable policies where you can either > > rely on > > > >> > > Polaris to store the polices or the pluggable implementation would > > > >> handle > > > >> > > how that can be stored, which I think fits well in both worlds. > > > >> > > > > > >> > > Regards, > > > >> > > Omar > > > >> > > > > > >> > > On Tue, Dec 3, 2024 at 10:26 PM Yufei Gu <flyrain...@gmail.com> > > > >> wrote: > > > >> > > > > > >> > > > Sorry the meeting title is misleading, the meeting itself is > > > >> scheduled > > > >> > on > > > >> > > > Dec. 5th. Thanks Anurag for pointing that out. > > > >> > > > > > > >> > > > Table maintenance in Polaris > > > >> > > > Thursday, December 5 · 9:00 – 10:00am > > > >> > > > Time zone: America/Los_Angeles > > > >> > > > Google Meet joining info > > > >> > > > Video call link: https://meet.google.com/dix-cdfm-pve > > > >> > > > > > > >> > > > Yufei > > > >> > > > > > > >> > > > > > > >> > > > On Tue, Dec 3, 2024 at 12:32 PM Anurag Mantripragada > > > >> > > > <amantriprag...@apple.com.invalid> wrote: > > > >> > > > > > > >> > > > > Thanks Yufei, I think you meant Thursday, December 5th 9:00am > > – > > > >> > 10:00am > > > >> > > > > (GMT-08). > > > >> > > > > > > > >> > > > > > > > >> > > > > Anurag Mantripragada > > > >> > > > > > > > >> > > > > > > > >> > > > > > On Dec 3, 2024, at 11:33 AM, Yufei Gu <flyrain...@gmail.com > > > > > > >> > wrote: > > > >> > > > > > > > > >> > > > > > Hi Folks, > > > >> > > > > > > > > >> > > > > > We’ve made some adjustments to the design, moving from > > *Option > > > >> 1* > > > >> > to > > > >> > > > > *Option > > > >> > > > > > 2*: > > > >> > > > > > > > > >> > > > > > 1. *Option 1:* Store maintenance policies in > > > >> > > catalog/namespace/table > > > >> > > > > > properties. > > > >> > > > > > 2. *Option 2:* Store maintenance policies as separate > > > >> entities. > > > >> > > > > > > > > >> > > > > > The key concern with Option 1 is that the access control > > model > > > >> > isn't > > > >> > > > > > scalable. On the other hand, Option 2 provides greater > > > >> flexibility, > > > >> > > > > > improved privilege enforcement, and better overall > > performance. > > > >> > > > > > > > > >> > > > > > I’ve updated the design document with the latest changes, > > which > > > >> you > > > >> > > can > > > >> > > > > > find here: Updated Design Document > > > >> > > > > > < > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > https://docs.google.com/document/d/1Pd_mzZcfvnUvcH98IbwsIYf4eryet1lQDfclKYx-t-M/edit?usp=sharing > > > >> > > > > > > > > >> > > > > > . > > > >> > > > > > > > > >> > > > > > To discuss this design change in detail, I’ll be hosting a > > > >> session > > > >> > on > > > >> > > > > > Thursday. Please find the meeting details below: > > > >> > > > > > Table maintenance in Polaris @ Thu, Nov 7, 2024 9:00am – > > 10:00am > > > >> > > > (GMT-08) > > > >> > > > > > Thursday, December 5 · 9:00 – 10:00am > > > >> > > > > > Time zone: America/Los_Angeles > > > >> > > > > > Google Meet joining info > > > >> > > > > > Video call link: https://meet.google.com/dix-cdfm-pve > > > >> > > > > > > > > >> > > > > > Feel free to review the updated document ahead of the > > session. > > > >> > > Looking > > > >> > > > > > forward to your thoughts and feedback during the meeting! > > > >> > > > > > > > > >> > > > > > Yufei > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > On Mon, Nov 18, 2024 at 9:43 PM Jean-Baptiste Onofré < > > > >> > > j...@nanthrax.net> > > > >> > > > > > wrote: > > > >> > > > > > > > > >> > > > > >> Hi Yufei > > > >> > > > > >> > > > >> > > > > >> Not sure we got consensus in all details but the overall > > > >> picture > > > >> > is > > > >> > > ok > > > >> > > > > for > > > >> > > > > >> me. > > > >> > > > > >> > > > >> > > > > >> Let’s continue the details definition in the PR. > > > >> > > > > >> > > > >> > > > > >> Thanks ! > > > >> > > > > >> Regards > > > >> > > > > >> JB > > > >> > > > > >> > > > >> > > > > >> Le jeu. 14 nov. 2024 à 02:39, Yufei Gu < > > flyrain...@gmail.com> > > > >> a > > > >> > > > écrit : > > > >> > > > > >> > > > >> > > > > >>> Hi everyone, > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> Thank you for joining the table maintenance discussion > > today! > > > >> We > > > >> > > made > > > >> > > > > >>> significant progress, and here are the key takeaways: > > > >> > > > > >>> > > > >> > > > > >>> 1. Clarified furthermore and reached consensus on > > > >> introducing > > > >> > > table > > > >> > > > > >>> maintenance properties in Polaris to support for > > different > > > >> TMS > > > >> > > and > > > >> > > > > >>> promote > > > >> > > > > >>> interoperability. > > > >> > > > > >>> 2. Agreed to proceed with Option 1, which stores > > metadata as > > > >> > > > > >>> catalog/namespace/table properties. > > > >> > > > > >>> 3. Confirmed the new privileges to ensure that > > maintenance > > > >> > > > properties > > > >> > > > > >>> are safeguarded from being altered by clients with > > existing > > > >> > write > > > >> > > > > >>> access. > > > >> > > > > >>> 4. Briefly discussed the support for customized > > maintenance > > > >> > > > policies > > > >> > > > > . > > > >> > > > > >>> > > > >> > > > > >>> Next step: > > > >> > > > > >>> > > > >> > > > > >>> 1. Will file maintenance properties related PRs per > > design > > > >> > > > > >>> 2. Will add more details for customized policy support. > > > >> > > > > >>> > > > >> > > > > >>> *Note*: Unfortunately, I wasn’t able to record the meeting > > > >> due to > > > >> > > the > > > >> > > > > >> need > > > >> > > > > >>> for a Google premium account. > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> Yufei > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> On Tue, Nov 12, 2024 at 10:10 AM Omar Al-Safi < > > > >> o...@oalsafi.com> > > > >> > > > > wrote: > > > >> > > > > >>> > > > >> > > > > >>>> Thank you! Will try to be there > > > >> > > > > >>>> > > > >> > > > > >>>> On Tue, 12 Nov 2024, 18:55 Yufei Gu, < > > flyrain...@gmail.com> > > > >> > > wrote: > > > >> > > > > >>>> > > > >> > > > > >>>>> Hi Omar, I sent the invitation to > > dev@polaris.apache.org, > > > >> as > > > >> > > well > > > >> > > > as > > > >> > > > > >>>> your > > > >> > > > > >>>>> email address. > > > >> > > > > >>>>> > > > >> > > > > >>>>> Yufei > > > >> > > > > >>>>> > > > >> > > > > >>>>> > > > >> > > > > >>>>> On Tue, Nov 12, 2024 at 9:51 AM Omar Al-Safi < > > > >> o...@oalsafi.com > > > >> > > > > > >> > > > > >> wrote: > > > >> > > > > >>>>> > > > >> > > > > >>>>>> Thanks Yufei, is it possible to send the invitation to > > the > > > >> > > > > >>>>>> Polaris google group? > > > >> > > > > >>>>>> > > > >> > > > > >>>>>> Regards, > > > >> > > > > >>>>>> Omar > > > >> > > > > >>>>>> > > > >> > > > > >>>>>> On Tue, Nov 12, 2024 at 6:48 PM Yufei Gu < > > > >> > flyrain...@gmail.com> > > > >> > > > > >>> wrote: > > > >> > > > > >>>>>> > > > >> > > > > >>>>>>> Hi folks, > > > >> > > > > >>>>>>> > > > >> > > > > >>>>>>> We are going to have another sync for table > > maintenance in > > > >> > > > > >> Polaris > > > >> > > > > >>>> per > > > >> > > > > >>>>>>> discussion with JB. Here are meeting details: > > > >> > > > > >>>>>>> > > > >> > > > > >>>>>>> Polaris Table maintenance sync > > > >> > > > > >>>>>>> Wednesday, November 13 · 10:00 – 11:00am > > > >> > > > > >>>>>>> Time zone: America/Los_Angeles > > > >> > > > > >>>>>>> Google Meet joining info > > > >> > > > > >>>>>>> Video call link: https://meet.google.com/nyy-ahmn-jqd > > > >> > > > > >>>>>>> > > > >> > > > > >>>>>>> > > > >> > > > > >>>>>>> Yufei > > > >> > > > > >>>>>>> > > > >> > > > > >>>>>>> > > > >> > > > > >>>>>>> On Fri, Nov 8, 2024 at 5:23 PM Yufei Gu < > > > >> > flyrain...@gmail.com> > > > >> > > > > >>>> wrote: > > > >> > > > > >>>>>>> > > > >> > > > > >>>>>>>> Thanks everyone for joining the discussion. Sorry I > > > >> couldn't > > > >> > > > > >>> record > > > >> > > > > >>>>> the > > > >> > > > > >>>>>>>> session due to a tech issue. Here are meeting notes: > > > >> > > > > >>>>>>>> > > > >> > > > > >>>>>>>> 1. We discussed the boundary between Polaris and > > the > > > >> Table > > > >> > > > > >>>>>> Maintenance > > > >> > > > > >>>>>>>> System(TMS). We agreed that they should be > > separated > > > >> > > > > >> systems. > > > >> > > > > >>>>>>>> 2. A general agreement on the minimal metadata > > added to > > > >> > > > > >>> Polaris > > > >> > > > > >>>> to > > > >> > > > > >>>>>>>> support TMS, focusing on essential data needed for > > > >> > > > > >>>>> interoperability. > > > >> > > > > >>>>>>>> 3. A general consensus on option 1 to store > > metadata as > > > >> > > > > >>>>>>>> catalog/namespace/table properties. We could > > introduce > > > >> > > > > >> policy > > > >> > > > > >>>>>>> entities in > > > >> > > > > >>>>>>>> the future for other use cases, like column > > masking. > > > >> Will > > > >> > > > > >>>> address > > > >> > > > > >>>>>> two > > > >> > > > > >>>>>>>> feedbacks: > > > >> > > > > >>>>>>>> 1. Caching the table properties in the catalog > > to > > > >> > reduce > > > >> > > > > >> IO > > > >> > > > > >>>>> cost. > > > >> > > > > >>>>>>>> 2. Introducing new permissions for table > > maintenance > > > >> > > > > >>> related > > > >> > > > > >>>>>>>> metadata to prevent any clients with the write > > > >> > permission > > > >> > > > > >>> to > > > >> > > > > >>>>> mess > > > >> > > > > >>>>>>> up with > > > >> > > > > >>>>>>>> them. > > > >> > > > > >>>>>>>> 4. Briefly touched on the communication module > > between > > > >> TMS > > > >> > > > > >> and > > > >> > > > > >>>>>>>> Polaris, as a long-term plan, an event system from > > > >> Polaris > > > >> > > > > >> is > > > >> > > > > >>>>>>> necessary, > > > >> > > > > >>>>>>>> not only benefits TMS, but also benefits other > > systems > > > >> > which > > > >> > > > > >>>>> consume > > > >> > > > > >>>>>>> change > > > >> > > > > >>>>>>>> from Polaris. > > > >> > > > > >>>>>>>> > > > >> > > > > >>>>>>>> Next Steps: > > > >> > > > > >>>>>>>> > > > >> > > > > >>>>>>>> 1. Implement metadata storage as properties > > > >> > > > > >>>>>>>> 1. Design detailed schema for properties > > > >> > > > > >>>>>>>> 2. Figure out a way to be extensible for future > > > >> > > > > >> maintenance > > > >> > > > > >>>>>> policy > > > >> > > > > >>>>>>>> or customized policies. > > > >> > > > > >>>>>>>> 3. Add new permissions for new properties > > > >> > > > > >>>>>>>> 2. Begin planning for event system > > > >> > > > > >>>>>>>> > > > >> > > > > >>>>>>>> Yufei > > > >> > > > > >>>>>>>> > > > >> > > > > >>>>>>>> > > > >> > > > > >>>>>>>> On Tue, Nov 5, 2024 at 12:25 AM Jean-Baptiste Onofré > > < > > > >> > > > > >>>>> j...@nanthrax.net> > > > >> > > > > >>>>>>>> wrote: > > > >> > > > > >>>>>>>> > > > >> > > > > >>>>>>>>> Hi Yufei > > > >> > > > > >>>>>>>>> > > > >> > > > > >>>>>>>>> Thanks for scheduling this ! > > > >> > > > > >>>>>>>>> > > > >> > > > > >>>>>>>>> I should be able to join. > > > >> > > > > >>>>>>>>> > > > >> > > > > >>>>>>>>> For the community, will you be able to record ? > > > >> > > > > >>>>>>>>> > > > >> > > > > >>>>>>>>> Regards > > > >> > > > > >>>>>>>>> JB > > > >> > > > > >>>>>>>>> > > > >> > > > > >>>>>>>>> On Mon, Nov 4, 2024 at 10:40 PM Yufei Gu < > > > >> > > > > >> flyrain...@gmail.com> > > > >> > > > > >>>>>> wrote: > > > >> > > > > >>>>>>>>>> > > > >> > > > > >>>>>>>>>> Hi Folks, > > > >> > > > > >>>>>>>>>> > > > >> > > > > >>>>>>>>>> I've scheduled a community sync to discuss table > > > >> > maintenance > > > >> > > > > >>> in > > > >> > > > > >>>>>>> Polaris > > > >> > > > > >>>>>>>>>> this Thursday at 9 AM PST. Since we didn’t have a > > > >> chance > > > >> > to > > > >> > > > > >>> dive > > > >> > > > > >>>>>> into > > > >> > > > > >>>>>>>>> this > > > >> > > > > >>>>>>>>>> topic during our last sync, this will be a > > dedicated > > > >> > session > > > >> > > > > >>> to > > > >> > > > > >>>>>> cover > > > >> > > > > >>>>>>>>> it in > > > >> > > > > >>>>>>>>>> detail. > > > >> > > > > >>>>>>>>>> > > > >> > > > > >>>>>>>>>> *Updates to Note:* I've made some updates to the > > design > > > >> > > > > >>>> document, > > > >> > > > > >>>>>>> with a > > > >> > > > > >>>>>>>>>> particular focus on the approach for maintenance > > > >> metadata. > > > >> > > > > >> The > > > >> > > > > >>>>>>> document > > > >> > > > > >>>>>>>>> now > > > >> > > > > >>>>>>>>>> favors *Option 1*, which involves leveraging table, > > > >> > > > > >> namespace, > > > >> > > > > >>>> and > > > >> > > > > >>>>>>>>> catalog > > > >> > > > > >>>>>>>>>> properties for maintenance metadata. > > > >> > > > > >>>>>>>>>> > > > >> > > > > >>>>>>>>>> Please review the latest version of the design doc > > > >> before > > > >> > > > > >> the > > > >> > > > > >>>>>> meeting, > > > >> > > > > >>>>>>>>> as > > > >> > > > > >>>>>>>>>> it will help us streamline the discussion. > > > >> > > > > >>>>>>>>>> > > > >> > > > > >>>>>>>>>> Looking forward to everyone’s insights! > > > >> > > > > >>>>>>>>>> Video call link: > > https://meet.google.com/opc-vath-mgb > > > >> > > > > >>>>>>>>>> Design doc: > > > >> > > > > >>>>>>>>>> > > > >> > > > > >>>>>>>>> > > > >> > > > > >>>>>>> > > > >> > > > > >>>>>> > > > >> > > > > >>>>> > > > >> > > > > >>>> > > > >> > > > > >>> > > > >> > > > > >> > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > https://docs.google.com/document/d/1Pd_mzZcfvnUvcH98IbwsIYf4eryet1lQDfclKYx-t-M/edit?usp=sharing > > > >> > > > > >>>>>>>>>> < > > > >> > > > > >>>>>>>>> > > > >> > > > > >>>>>>> > > > >> > > > > >>>>>> > > > >> > > > > >>>>> > > > >> > > > > >>>> > > > >> > > > > >>> > > > >> > > > > >> > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > https://www.google.com/url?q=https://docs.google.com/document/d/1Pd_mzZcfvnUvcH98IbwsIYf4eryet1lQDfclKYx-t-M/edit?usp%3Dsharing&sa=D&source=calendar&usd=2&usg=AOvVaw2V3IjIcadea8miDcKKSG9I > > > >> > > > > >>>>>>>>>> > > > >> > > > > >>>>>>>>>> > > > >> > > > > >>>>>>>>>> Yufei > > > >> > > > > >>>>>>>>> > > > >> > > > > >>>>>>>> > > > >> > > > > >>>>>>> > > > >> > > > > >>>>>> > > > >> > > > > >>>>> > > > >> > > > > >>>> > > > >> > > > > >>> > > > >> > > > > >> > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > >