Re: [PROPOSAL] Asynchronous & Reliable Tasks

Russell Spitzer Thu, 31 Jul 2025 11:37:46 -0700

I'm fine with the plan although I think we should probably change step 4 to 
allow both the current implementation and the new implementation to exist at 
the same time with a flag for switching over to the new task implementation. 
While the new implementation may be much better, it is a pretty significant 
behavior change that I think should be opt in until it's been in Polaris for a 
release or two. After that we could force all users to switch once it's been 
out in the wild for a bit.


On 2025/07/30 01:30:43 William Hyun wrote:
> >
> > Considering the current issues, I don't think it's worth the effort to
> > keep the current implementation.
> 
> 
> It seems risky to me to not support the current implementation at least for
> the period where the new tasks implementation is unstable.
> 
> Bests,
> William
> 
> On Tue, Jul 29, 2025 at 3:49 AM Robert Stupp <[email protected]> wrote:
> 
> > Hi,
> >
> > (starting w/ a recap for everybody watching this thread)
> > The goal of this is to have a mechanism to guarantee the _eventual_
> > execution of a task. That may happen immediately on the same node or
> > at a later time on another node.
> > This particular "async reliable tasks" is to ensure that tasks run
> > eventually in any Polaris node. The related "Delegation Service"
> > proposal is to let tasks run in a separate, different remote service.
> > But it requires a "local fallback" in case the remote service is not
> > available, which would be provided by this proposal.
> >
> > Currently, all scheduled and running tasks are "lost", if Polaris is
> > stopped, killed or crashed. So I'd prefer to get this proposal in
> > first to address the current issues and have a reliable fallback for
> > the Delegation Service.
> >
> > Considering the current issues, I don't think it's worth the effort to
> > keep the current implementation.
> >
> > Both, this proposal and the Delegation Service, shouldn't rely on
> > Polaris entities but rather have targeted definitions for the tasks to
> > execute, which contain exactly (and not more) what the tasks need to
> > be executed.
> >
> > So I think the following steps (approx 1 PR for each) would be:
> > 1. Add the tasks API (the draft PR [1])
> > 2. Add the tasks implementation, w/o any persistence integration but
> > with mock testing
> > 3. Add persistence integration
> > 4. Replace current task implementation with the new one
> >
> > I'll probably have more details soon-ish.
> >
> > Robert
> >
> > [1] https://github.com/apache/polaris/pull/2180
> >
> >
> >
> > On Mon, Jul 28, 2025 at 6:22 AM William Hyun <[email protected]> wrote:
> > >
> > > Hey Robert!
> > >
> > > Thank you for the draft PR.
> > > I have taken a look and the general approach seems good to me.
> > > However, one of my concerns would be the timeline to deliver this new
> > > task framework refactoring as this could be intrusive due to the scope
> > > of the change.
> > > What do you plan as the ETA for delivering this change?
> > >
> > > It seems we need to support both the pre-existing (v1) and new task
> > > framework (v2) until we are sure that v2 is stabilized so that we can
> > > delete v1.
> > > With the Delegation Service proposal being a new feature for users, I
> > > am proposing to include it within the 1.1 release as a small, optional
> > > extension and also support it in v2 by reusing via implementing v2's
> > > SPI module as we previously discussed.
> > > I also have opened a PR demonstrating what the Delegation Service
> > > looks like here:
> > >
> > > - https://github.com/apache/polaris/pull/2193
> > >
> > > WDYT?
> > >
> > > Bests,
> > > William
> > >
> > > On Thu, Jul 24, 2025 at 11:18 AM Robert Stupp <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > As discussed on the Polaris Community Sync today, we're aligned that
> > > > the current tasks handling needs some refactoring.
> > > >
> > > > This proposal focuses on the "eventual execution" of a task.
> > > > Implementations for would follow.
> > > > The "Delegation Service" [1]  proposal focuses on the execution of
> > > > tasks "outside" of Polaris.
> > > >
> > > > I've pushed a draft PR [2] with the Java interfaces and value types
> > > > for the API, the SPI (behavior implementation) and store (used by
> > > > tasks implementations).
> > > >
> > > > The only entry point is the `org.apache.polaris.tasks.api.Tasks`
> > > > interface with a function defining the behavior and providing a
> > > > parameter object (if necessary), returning a `TaskSubmission`. Call
> > > > sites _may_ subscribe to a `CompletionStage`, but the idea is that
> > > > it's rather "fire and forget" and the task behavior does "everything
> > > > that's needed". This allows the task to be executed on any node.
> > > > There's no guarantee in any form that a task will run "locally" or any
> > > > other specific node. Every Polaris node can handle task execution and
> > > > perform failure/retry handling. Polaris nodes may use a "server"
> > > > implementation or a "client" implementation or a "remote"
> > > > implementation - that's defined upon deployment or by configuration
> > > > (TBD).
> > > >
> > > > I think that we can get to a Polaris internal API/SPI that can be
> > > > leveraged by both proposals.
> > > > This proposal is implementation and persistence backend agnostic.
> > > > There could be a "server" implementation that can run tasks, a
> > > > "client" implementation that can only submit tasks (think: from the
> > > > polaris-admin tool), and an implementation for the delegation service
> > > > to execute tasks remotely.
> > > >
> > > > I do have a working implementation sitting around locally that's
> > > > passing tests exercising concurrency, multi-node and failure
> > > > scenarios. Since there's only a store-implementation for NoSQL, I
> > > > haven't pushed that yet. Adding a store-implementation that solely
> > > > uses `BasePersistence``(JDBC) is not such a big deal.
> > > >
> > > > If we're okay with the approach in general, I can follow up with a
> > > > more concrete implementation including the "purge table" use case and
> > > > maybe another example use case.
> > > >
> > > > Robert
> > > >
> > > > [1] https://lists.apache.org/thread/ph10th4ocjczpf5gz17mqys4fkp5qrzw
> > > > [2] https://github.com/apache/polaris/pull/2180
> > > >
> > > > On Mon, May 19, 2025 at 12:05 PM Robert Stupp <[email protected]> wrote:
> > > > >
> > > > > Yes, each "task behavior" has an ID. I've chosen the term "task
> > > > > behavior" over "type", because it doesn't only define "what's done"
> > but
> > > > > also "when" it's done (delay) and "how it behaves" (retries on
> > failures).
> > > > >
> > > > > On 14.05.25 04:25, Adnan Hemani wrote:
> > > > > > Hi Robert,
> > > > > >
> > > > > > Firstly, thanks for this document. One quick question: is the
> > `behavior ID` basically the task type? This part was slightly unclear to me.
> > > > > >
> > > > > > Best,
> > > > > > Adnan Hemani
> > > > > >
> > > > > >> On May 9, 2025, at 6:07 AM, Robert Stupp <[email protected]> wrote:
> > > > > >>
> > > > > >> Hi,
> > > > > >>
> > > > > >> Polaris is a service, which has to eventually perform operations
> > asynchronously. Polaris is also meant to be backed by multiple server
> > instances (think: high-availability & load-balancing setups).
> > > > > >>
> > > > > >> During runtime, things can go sideways in many ways. Server
> > instances may crash, be killed or whatever... Task executions may fail,
> > because some other remote service fails, configuration values (and
> > credentials) may be wrong or other error situations.
> > > > > >>
> > > > > >> Task execution should be resilient to both kinds of scenarios:
> > being able to eventually recover from a "dead/lost node" scenario and to
> > retry failed tasks.
> > > > > >>
> > > > > >> Each individual task should also be executed only once.
> > > > > >>
> > > > > >> There are also different kinds of tasks with different behaviors:
> > the "function" being executed and the retry behavior.
> > > > > >>
> > > > > >> Proposal doc for this:
> > https://www.google.com/url?q=https://docs.google.com/document/d/17D28E2ne5dzOHWc9DJ91Yz3lnQOtgmWaA_TBNdXv0sY/edit?tab%3Dt.0&source=gmail-imap&ust=1747400861000000&usg=AOvVaw3x56ChuB1ga0MelG6URxxi
> > > > > >>
> > > > > >> Robert
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Robert Stupp
> > > > > >> @snazy
> > > > > >>
> > > > > --
> > > > > Robert Stupp
> > > > > @snazy
> > > > >
> >
>

Re: [PROPOSAL] Asynchronous & Reliable Tasks

Reply via email to