Hi Seth,

big +1, happy to see this moving forward :) I have seen plenty of users,
who refrained using managed state for some of their data/use cases due to
the lack of something like this. I am not sure about the name "Savepoint
Connector", but for a different reason. While it is technically a
"connector" for the DataSet API, the feature is much more than a connector,
almost like a small "Savepoint API".

Best,

Konstantin







On Fri, May 31, 2019 at 1:45 AM Steven Wu <stevenz...@gmail.com> wrote:

> this is an awesome feature.
>
> > The name "Savepoint Connector" might indeed be not that good, as it
> doesn't
> point out the fact that with the current design, all kinds of snapshots
> (savepoint / full or incremental checkpoints) can be read.
>
> @Gordon can you add the above clarification to the FLIP page? I was
> wondering if it supports (full or incremental) checkpoint.
>
> Here is one way how we can leverage this new feature. For monitoring
> lag/stuck problem, we have external service monitor committed offsets to
> Kafka broker. As you probably know, async commit Kafka offset is
> best-effort  in notifyCheckpointComplete. We have run into Kafka
> scalability issue for parallelism jobs due to single coordinator.
> Alternative is to inspect the source of truth of Flink checkpoint/savepoint
> and extract offsets from Kafka source operator.
>
>
> On Thu, May 30, 2019 at 4:51 AM Paul Lam <paullin3...@gmail.com> wrote:
>
> > Hi,
> >
> > @Gordon @Seth
> >
> > Thanks a lot for your inputs! In general, I agree with you. The metadata
> > querying feature is a nice-to-have but not a must-have, and it’s
> reasonable
> > to make it as a follow up since it requires some extra work.
> >
> > Best,
> > Paul Lam
> >
> > > 在 2019年5月30日,19:22,Seth Wiesman <sjwies...@gmail.com> 写道:
> > >
> > > @Paul
> > >
> > > I agree with Gordon that those are useful features. The only thing I’d
> > like to add is that I don’t believe listing operator ids will be useful
> to
> > most users, they want to see UIDs which would also require changes to the
> > Savepoint metadata file. I think that would be a good follow up but
> outside
> > the scope of an initial implementation.
> > >
> > > Seth
> > >
> > >> On May 30, 2019, at 3:05 AM, Louis <xu_soft39211...@163.com> wrote:
> > >>
> > >> +1 from my size.
> > >>
> > >> I think it will be a good feature.
> > >>
> > >> Best
> > >> --
> > >> Louis
> > >> Email:xu_soft39211...@163.com
> > >>
> > >>> On 30 May 2019, at 15:57, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
> > wrote:
> > >>>
> > >>> The name "Savepoint Connector" might indeed be not that good, as it
> > doesn't
> > >>> point out the fact that with the current design, all kinds of
> snapshots
> > >>> (savepoint / full or incremental checkpoints) can be read.
> > >>>
> > >>> @Paul
> > >>> That would be a very valid requirement. Querying the list of existing
> > >>> operator ids should be straight forward, as that information is in
> the
> > >>> snapshot metadata file.
> > >>> However, querying state names / state structure / state type would
> > >>> currently be impossible without also reading the state itself, as
> that
> > >>> information isn't globally available and can only be known when each
> > key
> > >>> group is being read. We could potentially make those information
> > available
> > >>> in the snapshot metadata file, but that would require more work. I
> > think
> > >>> that can be a next step once we have an initial version.
> > >>>
> > >>> Cheers,
> > >>> Gordon
> > >>>
> > >>>> On Thu, May 30, 2019 at 1:21 PM Paul Lam <paullin3...@gmail.com>
> > wrote:
> > >>>>
> > >>>> Hi Seth,
> > >>>>
> > >>>> Sorry for the confusion. I mean currently we need to know the
> > operator id,
> > >>>> state name and the state type (eg. ListState, MapState) beforehand
> to
> > get
> > >>>> the states. Is possible that we can perform a scan to get all
> existing
> > >>>> operator ids or state names in the savepoint? It would be good to
> > know what
> > >>>> states are in the savepoint before we get to a specific state.
> > >>>>
> > >>>> For example, if we analyze a savepoint created weeks ago, and the
> > >>>> corresponding job has been modified since that, say, moved from
> > KafkaSink
> > >>>> to KinesisSink, so we are not sure whether we have the Kafka sink
> > states or
> > >>>> the Kinesis sink states in the savepoint and might need to try twice
> > to get
> > >>>> the right one.
> > >>>>
> > >>>> I’m not familiar with the savepoint formats, so pardon me if it’s a
> > dumb
> > >>>> question.
> > >>>>
> > >>>> Best,
> > >>>> Paul Lam
> > >>>>
> > >>>> 在 2019年5月30日,11:09,Seth Wiesman <sjwies...@gmail.com> 写道:
> > >>>>
> > >>>> Hi Paul,
> > >>>>
> > >>>> I’m not following, could you provide and example of the kind of
> > operation
> > >>>> your describing?
> > >>>>
> > >>>> Seth
> > >>>>
> > >>>> On May 29, 2019, at 7:37 PM, Paul Lam <paullin3...@gmail.com>
> wrote:
> > >>>>
> > >>>> Hi Seth,
> > >>>>
> > >>>> +1 from my side.
> > >>>>
> > >>>> I was wondering if we can add a reader method to provide a full view
> > of
> > >>>> the states instead of the state of a specific operator? It would be
> > helpful
> > >>>> when there is some unrestored states of a previously removed
> operator
> > in
> > >>>> the savepoint.
> > >>>>
> > >>>> Best,
> > >>>> Paul Lam
> > >>>>
> > >>>> 在 2019年5月30日,09:55,vino yang <yanghua1...@gmail.com> 写道:
> > >>>>
> > >>>> Hi Seth,
> > >>>>
> > >>>> Glad to see this FLIP, big +1 for this feature!
> > >>>>
> > >>>> Best,
> > >>>> Vino
> > >>>>
> > >>>> Seth Wiesman <sjwies...@gmail.com> 于2019年5月30日周四 上午7:14写道:
> > >>>>
> > >>>> Hey Everyone!
> > >>>> ​
> > >>>> Gordon and I have been discussing adding a savepoint connector to
> > flink
> > >>>> for reading, writing and modifying savepoints.
> > >>>> ​
> > >>>> This is useful for:
> > >>>> ​
> > >>>> Analyzing state for interesting patterns
> > >>>> Troubleshooting or auditing jobs by checking for discrepancies in
> > state
> > >>>> Bootstrapping state for new applications
> > >>>> Modifying savepoints such as:
> > >>>>   Changing max parallelism
> > >>>>   Making breaking schema changes
> > >>>>   Correcting invalid state
> > >>>> ​
> > >>>> We are looking forward to your feedback!
> > >>>> ​
> > >>>> This is the FLIP:
> > >>>> ​
> > >>>>
> > >>>>
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-43%3A+Savepoint+Connector
> > >>>>
> > >>>> Seth
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>


-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Planned Absences: 20. - 21.06.2019


<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen

Reply via email to