I agree with Robert that the current implementation is not good and should be ripped out ASAP. However, I see this effort as complementary to Will's refactor, not as a dependency. We should first add a layer of abstraction between the business logic in Polaris and the task execution -- once that's in place, we can replace the existing task implementation behind that abstraction. At the same time, adding this abstraction will unlock the ability for us to implement remote task execution as well.
--EM On Fri, Aug 1, 2025 at 6:31 AM Yufei Gu <flyrain...@gmail.com> wrote: > Thanks for the async task proposal. I think it's the right direction > for async light tasks. Meanwhile, we will still need other models: > 1. A scalable way to execute synchronous tasks > 2. A scalable way to execute heavy async tasks, e.g., table maintenance > tasks. > > The delegation service[1] is a good candidate for that. > > 1. > > https://docs.google.com/document/d/1AhR-cZ6WW6M-z8v53txOfcWvkDXvS-0xcMe3zjLMLj8/edit?tab=t.0#heading=h.xjibr7sfbv6a > > Yufei > > > On Thu, Jul 31, 2025 at 11:37 AM Russell Spitzer < > russellspit...@apache.org> > wrote: > > > I'm fine with the plan although I think we should probably change step 4 > > to allow both the current implementation and the new implementation to > > exist at the same time with a flag for switching over to the new task > > implementation. While the new implementation may be much better, it is a > > pretty significant behavior change that I think should be opt in until > it's > > been in Polaris for a release or two. After that we could force all users > > to switch once it's been out in the wild for a bit. > > > > On 2025/07/30 01:30:43 William Hyun wrote: > > > > > > > > Considering the current issues, I don't think it's worth the effort > to > > > > keep the current implementation. > > > > > > > > > It seems risky to me to not support the current implementation at least > > for > > > the period where the new tasks implementation is unstable. > > > > > > Bests, > > > William > > > > > > On Tue, Jul 29, 2025 at 3:49 AM Robert Stupp <sn...@snazy.de> wrote: > > > > > > > Hi, > > > > > > > > (starting w/ a recap for everybody watching this thread) > > > > The goal of this is to have a mechanism to guarantee the _eventual_ > > > > execution of a task. That may happen immediately on the same node or > > > > at a later time on another node. > > > > This particular "async reliable tasks" is to ensure that tasks run > > > > eventually in any Polaris node. The related "Delegation Service" > > > > proposal is to let tasks run in a separate, different remote service. > > > > But it requires a "local fallback" in case the remote service is not > > > > available, which would be provided by this proposal. > > > > > > > > Currently, all scheduled and running tasks are "lost", if Polaris is > > > > stopped, killed or crashed. So I'd prefer to get this proposal in > > > > first to address the current issues and have a reliable fallback for > > > > the Delegation Service. > > > > > > > > Considering the current issues, I don't think it's worth the effort > to > > > > keep the current implementation. > > > > > > > > Both, this proposal and the Delegation Service, shouldn't rely on > > > > Polaris entities but rather have targeted definitions for the tasks > to > > > > execute, which contain exactly (and not more) what the tasks need to > > > > be executed. > > > > > > > > So I think the following steps (approx 1 PR for each) would be: > > > > 1. Add the tasks API (the draft PR [1]) > > > > 2. Add the tasks implementation, w/o any persistence integration but > > > > with mock testing > > > > 3. Add persistence integration > > > > 4. Replace current task implementation with the new one > > > > > > > > I'll probably have more details soon-ish. > > > > > > > > Robert > > > > > > > > [1] https://github.com/apache/polaris/pull/2180 > > > > > > > > > > > > > > > > On Mon, Jul 28, 2025 at 6:22 AM William Hyun <will...@apache.org> > > wrote: > > > > > > > > > > Hey Robert! > > > > > > > > > > Thank you for the draft PR. > > > > > I have taken a look and the general approach seems good to me. > > > > > However, one of my concerns would be the timeline to deliver this > new > > > > > task framework refactoring as this could be intrusive due to the > > scope > > > > > of the change. > > > > > What do you plan as the ETA for delivering this change? > > > > > > > > > > It seems we need to support both the pre-existing (v1) and new task > > > > > framework (v2) until we are sure that v2 is stabilized so that we > can > > > > > delete v1. > > > > > With the Delegation Service proposal being a new feature for > users, I > > > > > am proposing to include it within the 1.1 release as a small, > > optional > > > > > extension and also support it in v2 by reusing via implementing > v2's > > > > > SPI module as we previously discussed. > > > > > I also have opened a PR demonstrating what the Delegation Service > > > > > looks like here: > > > > > > > > > > - https://github.com/apache/polaris/pull/2193 > > > > > > > > > > WDYT? > > > > > > > > > > Bests, > > > > > William > > > > > > > > > > On Thu, Jul 24, 2025 at 11:18 AM Robert Stupp <sn...@snazy.de> > > wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > As discussed on the Polaris Community Sync today, we're aligned > > that > > > > > > the current tasks handling needs some refactoring. > > > > > > > > > > > > This proposal focuses on the "eventual execution" of a task. > > > > > > Implementations for would follow. > > > > > > The "Delegation Service" [1] proposal focuses on the execution > of > > > > > > tasks "outside" of Polaris. > > > > > > > > > > > > I've pushed a draft PR [2] with the Java interfaces and value > types > > > > > > for the API, the SPI (behavior implementation) and store (used by > > > > > > tasks implementations). > > > > > > > > > > > > The only entry point is the `org.apache.polaris.tasks.api.Tasks` > > > > > > interface with a function defining the behavior and providing a > > > > > > parameter object (if necessary), returning a `TaskSubmission`. > Call > > > > > > sites _may_ subscribe to a `CompletionStage`, but the idea is > that > > > > > > it's rather "fire and forget" and the task behavior does > > "everything > > > > > > that's needed". This allows the task to be executed on any node. > > > > > > There's no guarantee in any form that a task will run "locally" > or > > any > > > > > > other specific node. Every Polaris node can handle task execution > > and > > > > > > perform failure/retry handling. Polaris nodes may use a "server" > > > > > > implementation or a "client" implementation or a "remote" > > > > > > implementation - that's defined upon deployment or by > configuration > > > > > > (TBD). > > > > > > > > > > > > I think that we can get to a Polaris internal API/SPI that can be > > > > > > leveraged by both proposals. > > > > > > This proposal is implementation and persistence backend agnostic. > > > > > > There could be a "server" implementation that can run tasks, a > > > > > > "client" implementation that can only submit tasks (think: from > the > > > > > > polaris-admin tool), and an implementation for the delegation > > service > > > > > > to execute tasks remotely. > > > > > > > > > > > > I do have a working implementation sitting around locally that's > > > > > > passing tests exercising concurrency, multi-node and failure > > > > > > scenarios. Since there's only a store-implementation for NoSQL, I > > > > > > haven't pushed that yet. Adding a store-implementation that > solely > > > > > > uses `BasePersistence``(JDBC) is not such a big deal. > > > > > > > > > > > > If we're okay with the approach in general, I can follow up with > a > > > > > > more concrete implementation including the "purge table" use case > > and > > > > > > maybe another example use case. > > > > > > > > > > > > Robert > > > > > > > > > > > > [1] > > https://lists.apache.org/thread/ph10th4ocjczpf5gz17mqys4fkp5qrzw > > > > > > [2] https://github.com/apache/polaris/pull/2180 > > > > > > > > > > > > On Mon, May 19, 2025 at 12:05 PM Robert Stupp <sn...@snazy.de> > > wrote: > > > > > > > > > > > > > > Yes, each "task behavior" has an ID. I've chosen the term "task > > > > > > > behavior" over "type", because it doesn't only define "what's > > done" > > > > but > > > > > > > also "when" it's done (delay) and "how it behaves" (retries on > > > > failures). > > > > > > > > > > > > > > On 14.05.25 04:25, Adnan Hemani wrote: > > > > > > > > Hi Robert, > > > > > > > > > > > > > > > > Firstly, thanks for this document. One quick question: is the > > > > `behavior ID` basically the task type? This part was slightly unclear > > to me. > > > > > > > > > > > > > > > > Best, > > > > > > > > Adnan Hemani > > > > > > > > > > > > > > > >> On May 9, 2025, at 6:07 AM, Robert Stupp <sn...@snazy.de> > > wrote: > > > > > > > >> > > > > > > > >> Hi, > > > > > > > >> > > > > > > > >> Polaris is a service, which has to eventually perform > > operations > > > > asynchronously. Polaris is also meant to be backed by multiple server > > > > instances (think: high-availability & load-balancing setups). > > > > > > > >> > > > > > > > >> During runtime, things can go sideways in many ways. Server > > > > instances may crash, be killed or whatever... Task executions may > fail, > > > > because some other remote service fails, configuration values (and > > > > credentials) may be wrong or other error situations. > > > > > > > >> > > > > > > > >> Task execution should be resilient to both kinds of > scenarios: > > > > being able to eventually recover from a "dead/lost node" scenario and > > to > > > > retry failed tasks. > > > > > > > >> > > > > > > > >> Each individual task should also be executed only once. > > > > > > > >> > > > > > > > >> There are also different kinds of tasks with different > > behaviors: > > > > the "function" being executed and the retry behavior. > > > > > > > >> > > > > > > > >> Proposal doc for this: > > > > > > > https://www.google.com/url?q=https://docs.google.com/document/d/17D28E2ne5dzOHWc9DJ91Yz3lnQOtgmWaA_TBNdXv0sY/edit?tab%3Dt.0&source=gmail-imap&ust=1747400861000000&usg=AOvVaw3x56ChuB1ga0MelG6URxxi > > > > > > > >> > > > > > > > >> Robert > > > > > > > >> > > > > > > > >> > > > > > > > >> -- > > > > > > > >> Robert Stupp > > > > > > > >> @snazy > > > > > > > >> > > > > > > > -- > > > > > > > Robert Stupp > > > > > > > @snazy > > > > > > > > > > > > > > > > >