Since it is marked as outdated by ttl policy, we think that it’s better to 
delete it anyway. Compaction & Clustering should deal with the case that the 
source data is already marked as deleted by ttl, otherwise there will still 
left some unused data in the partition.  What do you think? 

> On Oct 19, 2022, at 15:09, Teng Huo <teng_...@outlook.com> wrote:
> 
> Nice feature!
> @stream2000
> 
> Just one question, can it work with compaction logs? I mean, if there are 
> some log files already marked in a compaction plan, will they be deleted by 
> TTL?
> ________________________________
> From: sagar sumit <cod...@apache.org>
> Sent: Wednesday, October 19, 2022 2:42:36 PM
> To: dev@hudi.apache.org <dev@hudi.apache.org>
> Subject: Re: [DISCUSS] Hudi data TTL
> 
> +1 Very nice idea. Looking forward to the RFC!
> 
> On Wed, Oct 19, 2022 at 10:13 AM Shiyan Xu <xu.shiyan.raym...@gmail.com>
> wrote:
> 
>> great proposal. Partition TTL is a good starting point. we can extend it to
>> other TTL strategies like column-based, and make it customizable and
>> pluggable. Looking forward to the RFC!
>> 
>> On Wed, Oct 19, 2022 at 11:40 AM Jian Feng <jian.f...@shopee.com.invalid>
>> wrote:
>> 
>>> Good idea,
>>> this is definitely worth an  RFC
>>> btw should it only depend on Hudi's partition? I feel it should be a more
>>> common feature since sometimes customers' data can not update across
>>> partitions
>>> 
>>> 
>>> On Wed, Oct 19, 2022 at 11:07 AM stream2000 <18889897...@163.com> wrote:
>>> 
>>>> Hi all, we have implemented a partition based data ttl management,
>> which
>>>> we can manage ttl for hudi partition by size, expired time and
>>>> sub-partition count. When a partition is detected as outdated, we use
>>>> delete partition interface to delete it, which will generate a replace
>>>> commit to mark the data as deleted. The real deletion will then done by
>>>> clean service.
>>>> 
>>>> 
>>>> If community is interested in this idea, maybe we can propose a RFC to
>>>> discuss it in detail.
>>>> 
>>>> 
>>>>> On Oct 19, 2022, at 10:06, Vinoth Chandar <vin...@apache.org> wrote:
>>>>> 
>>>>> +1 love to discuss this on a RFC proposal.
>>>>> 
>>>>> On Tue, Oct 18, 2022 at 13:11 Alexey Kudinkin <ale...@onehouse.ai>
>>>> wrote:
>>>>> 
>>>>>> That's a very interesting idea.
>>>>>> 
>>>>>> Do you want to take a stab at writing a full proposal (in the form
>> of
>>>> RFC)
>>>>>> for it?
>>>>>> 
>>>>>> On Tue, Oct 18, 2022 at 10:20 AM Bingeng Huang <hbgstc...@gmail.com
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> Do we have plan to integrate data TTL into HUDI, so we don't have
>> to
>>>>>>> schedule a offline spark job to delete outdated data, just set a
>> TTL
>>>>>>> config, then writer or some offline service will delete old data as
>>>>>>> expected.
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> *Jian Feng,冯健*
>>> Shopee | Engineer | Data Infrastructure
>>> 
>> 
>> 
>> --
>> Best,
>> Shiyan
>> 

Reply via email to