Thanks Yuanjian for your support! I've left a comment but to replicate here - I agree with your point. It's really uneasy for a new feature to be stable from the initial version and we might want to decide on breaking backward compatibility for (semantic) bug fixes/improvements. Maybe we could mark the data source as incubating/experimental and look for a couple of minor releases to see whether the options/behaviors can be finalized.
On Wed, Oct 18, 2023 at 4:24 PM Yuanjian Li <xyliyuanj...@gmail.com> wrote: > +1, I have no issues with the practicality and value of this feature > itself. > I've left some comments concerning ongoing maintenance and > compatibility-related matters, which we can continue to discuss. > > Jungtaek Lim <kabhwan.opensou...@gmail.com> 于2023年10月17日周二 05:23写道: > >> Thanks Bartosz and Anish for your support! >> >> I'll wait for a couple more days to see whether we can hear more voices >> on this. We could probably look for initiating a VOTE thread if there is no >> objection. >> >> On Tue, Oct 17, 2023 at 5:48 AM Anish Shrigondekar < >> anish.shrigonde...@databricks.com> wrote: >> >>> Hi Jungtaek, >>> >>> Thanks for putting this together. +1 from me and looks good overall. >>> Posted some minor comments/questions to the doc. >>> >>> Thanks, >>> Anish >>> >>> On Mon, Oct 16, 2023 at 11:25 AM Bartosz Konieczny < >>> bartkoniec...@gmail.com> wrote: >>> >>>> Thank you, Jungtaek, for your answers! It's clear now. >>>> >>>> +1 for me. It seems like a prerequisite for further ops-related >>>> improvements for the state store management. I mean especially here the >>>> state rebalancing that could rely on this read+write state store API. I >>>> don't mean here the dynamic state rebalancing that could probably be >>>> implemented with a lower latency directly in the stateful API. Instead I'm >>>> thinking more of an offline job to rebalance the state and later restart >>>> the stateful pipeline with the changed number of shuffle partitions. >>>> >>>> Best, >>>> Bartosz. >>>> >>>> On Mon, Oct 16, 2023 at 6:19 PM Jungtaek Lim < >>>> kabhwan.opensou...@gmail.com> wrote: >>>> >>>>> bump for better reach >>>>> >>>>> On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim < >>>>> kabhwan.opensou...@gmail.com> wrote: >>>>> >>>>>> Sorry, please use this link instead for SPIP doc: >>>>>> https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing >>>>>> >>>>>> >>>>>> On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim < >>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>> >>>>>>> Hi dev, >>>>>>> >>>>>>> I'd like to start a discussion on "State Data Source - Reader". >>>>>>> >>>>>>> This proposal aims to introduce a new data source "statestore" which >>>>>>> enables reading the state rows from existing checkpoint via offline >>>>>>> (batch) >>>>>>> query. This will enable users to 1) create unit tests against stateful >>>>>>> query verifying the state value (especially flatMapGroupsWithState), 2) >>>>>>> gather more context on the status when an incident occurs, especially >>>>>>> for >>>>>>> incorrect output. >>>>>>> >>>>>>> *SPIP*: >>>>>>> https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing >>>>>>> *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511 >>>>>>> >>>>>>> Looking forward to your feedback! >>>>>>> >>>>>>> Thanks, >>>>>>> Jungtaek Lim (HeartSaVioR) >>>>>>> >>>>>>> ps. The scope of the project is narrowed to the reader in this SPIP, >>>>>>> since the writer requires us to consider more cases. We are planning on >>>>>>> it. >>>>>>> >>>>>> >>>> >>>> -- >>>> Bartosz Konieczny >>>> freelance data engineer >>>> https://www.waitingforcode.com >>>> https://github.com/bartosz25/ >>>> https://twitter.com/waitingforcode >>>> >>>>