Re: [PROPOSAL] Asynchronous & Reliable Tasks

Robert Stupp Thu, 24 Jul 2025 11:19:04 -0700

Hi,

As discussed on the Polaris Community Sync today, we're aligned that
the current tasks handling needs some refactoring.

This proposal focuses on the "eventual execution" of a task.
Implementations for would follow.
The "Delegation Service" [1]  proposal focuses on the execution of
tasks "outside" of Polaris.

I've pushed a draft PR [2] with the Java interfaces and value types
for the API, the SPI (behavior implementation) and store (used by
tasks implementations).

The only entry point is the `org.apache.polaris.tasks.api.Tasks`
interface with a function defining the behavior and providing a
parameter object (if necessary), returning a `TaskSubmission`. Call
sites _may_ subscribe to a `CompletionStage`, but the idea is that
it's rather "fire and forget" and the task behavior does "everything
that's needed". This allows the task to be executed on any node.
There's no guarantee in any form that a task will run "locally" or any
other specific node. Every Polaris node can handle task execution and
perform failure/retry handling. Polaris nodes may use a "server"
implementation or a "client" implementation or a "remote"
implementation - that's defined upon deployment or by configuration
(TBD).

I think that we can get to a Polaris internal API/SPI that can be
leveraged by both proposals.
This proposal is implementation and persistence backend agnostic.
There could be a "server" implementation that can run tasks, a
"client" implementation that can only submit tasks (think: from the
polaris-admin tool), and an implementation for the delegation service
to execute tasks remotely.

I do have a working implementation sitting around locally that's
passing tests exercising concurrency, multi-node and failure
scenarios. Since there's only a store-implementation for NoSQL, I
haven't pushed that yet. Adding a store-implementation that solely
uses `BasePersistence``(JDBC) is not such a big deal.

If we're okay with the approach in general, I can follow up with a
more concrete implementation including the "purge table" use case and
maybe another example use case.

Robert

[1] https://lists.apache.org/thread/ph10th4ocjczpf5gz17mqys4fkp5qrzw
[2] https://github.com/apache/polaris/pull/2180

On Mon, May 19, 2025 at 12:05 PM Robert Stupp <[email protected]> wrote:
>
> Yes, each "task behavior" has an ID. I've chosen the term "task
> behavior" over "type", because it doesn't only define "what's done" but
> also "when" it's done (delay) and "how it behaves" (retries on failures).
>
> On 14.05.25 04:25, Adnan Hemani wrote:
> > Hi Robert,
> >
> > Firstly, thanks for this document. One quick question: is the `behavior ID` 
> > basically the task type? This part was slightly unclear to me.
> >
> > Best,
> > Adnan Hemani
> >
> >> On May 9, 2025, at 6:07 AM, Robert Stupp <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> Polaris is a service, which has to eventually perform operations 
> >> asynchronously. Polaris is also meant to be backed by multiple server 
> >> instances (think: high-availability & load-balancing setups).
> >>
> >> During runtime, things can go sideways in many ways. Server instances may 
> >> crash, be killed or whatever... Task executions may fail, because some 
> >> other remote service fails, configuration values (and credentials) may be 
> >> wrong or other error situations.
> >>
> >> Task execution should be resilient to both kinds of scenarios: being able 
> >> to eventually recover from a "dead/lost node" scenario and to retry failed 
> >> tasks.
> >>
> >> Each individual task should also be executed only once.
> >>
> >> There are also different kinds of tasks with different behaviors: the 
> >> "function" being executed and the retry behavior.
> >>
> >> Proposal doc for this: 
> >> https://www.google.com/url?q=https://docs.google.com/document/d/17D28E2ne5dzOHWc9DJ91Yz3lnQOtgmWaA_TBNdXv0sY/edit?tab%3Dt.0&source=gmail-imap&ust=1747400861000000&usg=AOvVaw3x56ChuB1ga0MelG6URxxi
> >>
> >> Robert
> >>
> >>
> >> --
> >> Robert Stupp
> >> @snazy
> >>
> --
> Robert Stupp
> @snazy
>

Re: [PROPOSAL] Asynchronous & Reliable Tasks

Reply via email to