Apologies for missing the reference to Robert's doc. I hope it does not
invalidate my comments :)

This is certainly up for discussion.

To clarify my concern about the REST API: If we are to have resilient tasks
and the node that serves the initial REST request fails, other nodes will
have to be able to provide responses about the task instead of the failed
node. Ultimately the data will come from persistence (I assume). Also, I
suppose the Tasks Service is meant for internal interactions (not for
user-driven clients). Therefore, it seems to me that the REST API is
somewhat superficial in this case.

Like I mentioned before, this is just what I thought after a quick review.
I'll certainly have a deeper look later.

Cheers,
Dmitri.

On Mon, Jun 23, 2025 at 6:02 PM Eric Maynard <eric.w.mayn...@gmail.com>
wrote:

> Hey Dmitri,
>
> There's a section in the email above and the linked doc that talks about
> the linked proposal. See "Relationship to the "Asynchronous & Reliable
> Tasks" Proposal".
>
> As for pulling away from a REST API in favor of driving things directly
> from persistence, there's a lot to discuss here. Bear in mind that the
> design goes into detail about one proposed "TaskExecutor" implementation;
> maybe another TaskExecutor could work exactly like you describe. But the
> reason that this implementation proposes to be driven by a REST API is that
> there's a lot of interesting future work -- see the "Future Work" section
> of the doc for some examples -- that can be added on to the REST API. In
> particular, table maintenance actions like compaction.
>
> --EM
>
> On Mon, Jun 23, 2025 at 2:31 PM Dmitri Bourlatchkov <di...@apache.org>
> wrote:
>
> > Hi All,
> >
> > A previous proposal by Robert [1] from May 9 appears to be related. I
> think
> > we should consider both at the same time, possibly as alternatives, but
> > perhaps also sharing / reusing their respective ideas.
> >
> > A few notes after a quick review:
> >
> > * Separate scaling for task executors seems reasonable at first glance,
> but
> > it adds deployment complexity. If we go with this approach, I believe it
> > would be worth making this deployment strategy optional. In other words
> let
> > admin users decide whether they want to have extra nodes dedicated to
> > specific tasks or whether they are ok with having uniform nodes.
> >
> > * I'm not sure a separate rich REST API for submitting tasks is really
> > necessary. Proper synchronization among multiple nodes will
> > probably require roundtrips to Persistence anyway, so task submission
> could
> > probably be done via Persistence.
> >
> > [1] https://lists.apache.org/thread/gg0kn89vmblmjgllxn7jkn8ky2k28f5l
> >
> > Thanks,
> > Dmitri.
> >
> >
> > On Mon, Jun 23, 2025 at 3:12 PM William Hyun <will...@apache.org> wrote:
> >
> > > Hello Polaris Community,
> > >
> > > I would like to share my proposal for a new service, the Polaris
> > > Delegation Service, and to share the design document for discussion
> > > and feedback. The Delegation Service is intended to optionally be
> > > deployed alongside Polaris to handle the execution of certain
> > > long-running tasks.
> > >
> > > 1. Motivation
> > > The Polaris Catalog is optimized for low-latency metadata operations.
> > > However, certain tasks such as purging data files for dropped tables
> > > are resource-intensive and can impact its core performance. The
> > > motivation for this new service is to decouple these I/O-heavy
> > > background tasks from the main catalog, ensuring it remains highly
> > > responsive while allowing the task execution workload to be managed
> > > and scaled independently.
> > >
> > > 2. Proposal
> > > We propose an optional, independent Delegation Service responsible for
> > > executing these offloaded operations.
> > > The MVP will focus on synchronously handling the data file deletion
> > > process for DROP TABLE WITH PURGE commands.
> > >
> > > 3. Relationship to the "Asynchronous & Reliable Tasks" Proposal
> > > This proposal is designed to be highly synergistic with the existing
> > > "Asynchronous & Reliable Tasks" proposal.
> > >
> > > The Asynchronous Task proposal describes a general internal framework
> > > for reliably scheduling and managing the lifecycle of any task within
> > > Polaris. On the other hand, this proposal defines a specific, external
> > > worker service optimized for executing a particular class of I/O-heavy
> > > tasks.
> > >
> > > The Delegation Service does not alter the core Polaris task schema.
> > > This allows it to seamlessly act as a specialized "backend" worker
> > > that can execute tasks scheduled and managed by the more advanced
> > > Asynchronous Task Framework, which would serve as the reliable
> > > "frontend." This relationship is explored further in section 10.2 of
> > > the document.
> > >
> > > Please find the detailed design document here for review:
> > > -
> > >
> >
> https://docs.google.com/document/d/1AhR-cZ6WW6M-z8v53txOfcWvkDXvS-0xcMe3zjLMLj8/edit?usp=sharing
> > >
> > > Best Regards,
> > > William
> > >
> >
>

Reply via email to