Thanks wj for driving this! I'd like to give some inputs: 1. Java API 'createTag': Use 'Duration' as parameter instead of 'String'. I think it's better.
2. For the field 'tagCreateTime' in class 'Tag': I think we can just use the 'Snapshot#timeMillis' field. The 'timeMillis' is the create time of the snapshot, I think the time won't be used when we read the corresponding tag. So I think we can just reuse the field, what do you think? And if do so, in the tags system table, 'commit_time' can be renamed to 'create_time' or 'tag_create_time' or other name. 3. Should we add TTL to auto-created tags? I think we should. Users can set the same TTL for all auto-created tags by table options.My suggestion of how to handle `tag.num-retained-max` and TTL is: the TTL has higher priority. When we try to expire auto-created tag, we first found candidates by `tag.num-retained-max`, then if the candidate's survival time is less than TTL, we don't expire it. Best regards, Zelin Yu On Mon, Apr 1, 2024 at 9:54 AM <[email protected]> wrote: > Hi devs: > > I would like to start a discussion of PIP-20: Introduce TTL for tags which > are not auto-created. [1]. Currently, Paimon has automatic clearing > mechanisms for tags created by TagAutoCreation, but not for other tags. It > can't meet our demands.For example:1、The current tag cleanup mechanism may > lead to resource-wasting.2、Tag does not support TTL, so it is not flexible > to use. > This PIP aims to > support each Tag has its own TTL, so that the user can use the tag more > flexibly and reduce the probability of resource waste.And > Paimon keep up with other data lake products such as Iceberg. > Looking forward to your feedback, thanks. > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=300026341 > > > Best, > wangwj
