Thanks jingsong li and yu zelin for reply. > I think it's similar to the snapshot expire, where both the number and time are used to determine whether it should be deleted. This is reasonable, and the hit should be deleted.
OK, I will do. > Java API 'createTag': Use 'Duration' as parameter instead of 'String'. I think it's better. OK > For the field 'tagCreateTime' in class 'Tag': I think we can just use the 'Snapshot#timeMillis' field. The 'timeMillis' is the create time of the snapshot, I think the time won't be used when we read the corresponding tag. So I think we can just reuse the field, what do you think? And if do so, I think it's possible to reuse the 'Snapshot#timeMillis' field for auto created tags, but I don't think 'Snapshot#timeMillis' field can be used for non-auto created tags. what do you think? > in the tags system table, 'commit_time' can be renamed to 'create_time' or 'tag_create_time' or other name. I think create-time and time-retained is good. > Should we add TTL to auto-created tags? I think we should. Users can set the same TTL for all auto-created tags by table options.My suggestion of how to handle `tag.num-retained-max` and TTL is: the TTL has higher priority. When we try to expire an auto-created tag, we first found candidates by `tag.num-retained-max`, then if the candidate's survival time is less than TTL, we don't expire it. OK, I will do. On Tue, Apr 2, 2024 at 5:26 PM yu zelin <[email protected]> wrote: > Thanks wj for driving this! I'd like to give some inputs: > > 1. Java API 'createTag': Use 'Duration' as parameter instead of 'String'. I > think it's better. > > 2. For the field 'tagCreateTime' in class 'Tag': I think we can just use > the 'Snapshot#timeMillis' field. > The 'timeMillis' is the create time of the snapshot, I think the time won't > be used when we read > the corresponding tag. So I think we can just reuse the field, what do you > think? And if do so, > in the tags system table, 'commit_time' can be renamed to 'create_time' or > 'tag_create_time' or > other name. > > 3. Should we add TTL to auto-created tags? I think we should. Users can set > the same TTL for > all auto-created tags by table options.My suggestion of how to handle > `tag.num-retained-max` > and TTL is: the TTL has higher priority. When we try to expire auto-created > tag, we first found > candidates by `tag.num-retained-max`, then if the candidate's survival time > is less than TTL, we > don't expire it. > > Best regards, > Zelin Yu > > > On Mon, Apr 1, 2024 at 9:54 AM <[email protected]> wrote: > > > Hi devs: > > > > I would like to start a discussion of PIP-20: Introduce TTL for tags > which > > are not auto-created. [1]. Currently, Paimon has automatic clearing > > mechanisms for tags created by TagAutoCreation, but not for other tags. > It > > can't meet our demands.For example:1、The current tag cleanup mechanism > may > > lead to resource-wasting.2、Tag does not support TTL, so it is not > flexible > > to use. > > This PIP aims to > > support each Tag has its own TTL, so that the user can use the tag more > flexibly and reduce the probability of resource waste.And > > Paimon keep up with other data lake products such as Iceberg. > > Looking forward to your feedback, thanks. > > [1] > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=300026341 > > > > > > Best, > > wangwj >
