Re: [DISCUSS] Polaris Delegation Service for Long-Running Tasks

Anurag Mantripragada Tue, 24 Jun 2025 17:35:08 -0700

Thank you for your proposal, Willam.

This type of companion service is necessary, as evidenced by the other proposal 
on asynchronous tasks. Overall, this is a promising start. I understand that 
the scope for this proposal is limited, so please feel free to indicate that it 
is not in scope. However, I have a few questions:


1. Could you clarify in the documentation the source of truth for task status? 
From your diagram, it appears that it is in the delegation service.
2. The implementation details of the service are abstracted away. Are these not 
in scope for this design? (For instance, do we have a task queue in the 
delegation service?)
3. Could you provide additional details on how this service will be deployed?

It becomes very complicated when we transition from a synchronous model to an 
asynchronous model. (Handling failures, task executor unavailability, status 
updates, etc.) We can have a separate discussion for those.

Thank you,
Anurag Mantripragada


> On Jun 24, 2025, at 11:56 AM, William Hyun <will...@apache.org> wrote:
> 
> Hey Dmitri,
> 
> Thank you for your comments!
> 
> I would like to first clarify that while the initial use case is
> internal, we are not closing the door completely on having Delegation
> Service be accessible through user-driven clients.
> We would love this service to eventually be deployed and run
> independently from the Polaris Catalog to handle scheduled,
> asynchronous tasks as Eric mentioned above with compaction.
> We believe the REST API is the foundational building block for that
> evolution and the initial proposal aims to simply introduce the
> framework to the Polaris ecosystem with the purge table task as the
> main focal point.
> 
> Secondly, in addressing the concern about task failures, I have added
> a section in the appendix discussing the expected behavior of failed
> tasks.
> Please feel free to take a look and let me know what you think!
> - 
> https://docs.google.com/document/d/1AhR-cZ6WW6M-z8v53txOfcWvkDXvS-0xcMe3zjLMLj8/edit?tab=t.0#heading=h.fr5gi42vvat3
> 
> Bests,
> William
> 
> 
> On Mon, Jun 23, 2025 at 4:42 PM Dmitri Bourlatchkov <di...@apache.org> wrote:
>> 
>> Apologies for missing the reference to Robert's doc. I hope it does not
>> invalidate my comments :)
>> 
>> This is certainly up for discussion.
>> 
>> To clarify my concern about the REST API: If we are to have resilient tasks
>> and the node that serves the initial REST request fails, other nodes will
>> have to be able to provide responses about the task instead of the failed
>> node. Ultimately the data will come from persistence (I assume). Also, I
>> suppose the Tasks Service is meant for internal interactions (not for
>> user-driven clients). Therefore, it seems to me that the REST API is
>> somewhat superficial in this case.
>> 
>> Like I mentioned before, this is just what I thought after a quick review.
>> I'll certainly have a deeper look later.
>> 
>> Cheers,
>> Dmitri.
>> 
>> On Mon, Jun 23, 2025 at 6:02 PM Eric Maynard <eric.w.mayn...@gmail.com>
>> wrote:
>> 
>>> Hey Dmitri,
>>> 
>>> There's a section in the email above and the linked doc that talks about
>>> the linked proposal. See "Relationship to the "Asynchronous & Reliable
>>> Tasks" Proposal".
>>> 
>>> As for pulling away from a REST API in favor of driving things directly
>>> from persistence, there's a lot to discuss here. Bear in mind that the
>>> design goes into detail about one proposed "TaskExecutor" implementation;
>>> maybe another TaskExecutor could work exactly like you describe. But the
>>> reason that this implementation proposes to be driven by a REST API is that
>>> there's a lot of interesting future work -- see the "Future Work" section
>>> of the doc for some examples -- that can be added on to the REST API. In
>>> particular, table maintenance actions like compaction.
>>> 
>>> --EM
>>> 
>>> On Mon, Jun 23, 2025 at 2:31 PM Dmitri Bourlatchkov <di...@apache.org>
>>> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> A previous proposal by Robert [1] from May 9 appears to be related. I
>>> think
>>>> we should consider both at the same time, possibly as alternatives, but
>>>> perhaps also sharing / reusing their respective ideas.
>>>> 
>>>> A few notes after a quick review:
>>>> 
>>>> * Separate scaling for task executors seems reasonable at first glance,
>>> but
>>>> it adds deployment complexity. If we go with this approach, I believe it
>>>> would be worth making this deployment strategy optional. In other words
>>> let
>>>> admin users decide whether they want to have extra nodes dedicated to
>>>> specific tasks or whether they are ok with having uniform nodes.
>>>> 
>>>> * I'm not sure a separate rich REST API for submitting tasks is really
>>>> necessary. Proper synchronization among multiple nodes will
>>>> probably require roundtrips to Persistence anyway, so task submission
>>> could
>>>> probably be done via Persistence.
>>>> 
>>>> [1] https://lists.apache.org/thread/gg0kn89vmblmjgllxn7jkn8ky2k28f5l
>>>> 
>>>> Thanks,
>>>> Dmitri.
>>>> 
>>>> 
>>>> On Mon, Jun 23, 2025 at 3:12 PM William Hyun <will...@apache.org> wrote:
>>>> 
>>>>> Hello Polaris Community,
>>>>> 
>>>>> I would like to share my proposal for a new service, the Polaris
>>>>> Delegation Service, and to share the design document for discussion
>>>>> and feedback. The Delegation Service is intended to optionally be
>>>>> deployed alongside Polaris to handle the execution of certain
>>>>> long-running tasks.
>>>>> 
>>>>> 1. Motivation
>>>>> The Polaris Catalog is optimized for low-latency metadata operations.
>>>>> However, certain tasks such as purging data files for dropped tables
>>>>> are resource-intensive and can impact its core performance. The
>>>>> motivation for this new service is to decouple these I/O-heavy
>>>>> background tasks from the main catalog, ensuring it remains highly
>>>>> responsive while allowing the task execution workload to be managed
>>>>> and scaled independently.
>>>>> 
>>>>> 2. Proposal
>>>>> We propose an optional, independent Delegation Service responsible for
>>>>> executing these offloaded operations.
>>>>> The MVP will focus on synchronously handling the data file deletion
>>>>> process for DROP TABLE WITH PURGE commands.
>>>>> 
>>>>> 3. Relationship to the "Asynchronous & Reliable Tasks" Proposal
>>>>> This proposal is designed to be highly synergistic with the existing
>>>>> "Asynchronous & Reliable Tasks" proposal.
>>>>> 
>>>>> The Asynchronous Task proposal describes a general internal framework
>>>>> for reliably scheduling and managing the lifecycle of any task within
>>>>> Polaris. On the other hand, this proposal defines a specific, external
>>>>> worker service optimized for executing a particular class of I/O-heavy
>>>>> tasks.
>>>>> 
>>>>> The Delegation Service does not alter the core Polaris task schema.
>>>>> This allows it to seamlessly act as a specialized "backend" worker
>>>>> that can execute tasks scheduled and managed by the more advanced
>>>>> Asynchronous Task Framework, which would serve as the reliable
>>>>> "frontend." This relationship is explored further in section 10.2 of
>>>>> the document.
>>>>> 
>>>>> Please find the detailed design document here for review:
>>>>> -
>>>>> 
>>>> 
>>> https://docs.google.com/document/d/1AhR-cZ6WW6M-z8v53txOfcWvkDXvS-0xcMe3zjLMLj8/edit?usp=sharing
>>>>> 
>>>>> Best Regards,
>>>>> William
>>>>> 
>>>> 
>>>

Re: [DISCUSS] Polaris Delegation Service for Long-Running Tasks

Reply via email to