+1 on the tooling. Also, you mentioned about state bootstrapping problem. Could you please elaborate on how we can leverage the tooling to solve state bootstrapping? I think this is a common problem to stream processing, and it will be great the community can work on it. Thanks.
Shuyi On Wed, Aug 22, 2018 at 11:51 AM Gyula Fóra <gyula.f...@gmail.com> wrote: > Thanks, > > I guess the first thing that would be great help from anyone interested in > helping is to try it for some streaming state :) > > We have tested these tools at King to analyze, transform and perform some > aggregations on our user-states. The major limitation is that it requires > RocksDB savepoints to work but other than that we successfully analyzed a > few hundred gigabytes of state including reading keyed, and broadcast > states from different operators. Also you need to have a savepoint before > you can create a new savepoint (with whatever state). > > Once we have some people who have played with it we can probably greatly > improve the API and user experience as it is pretty low level at the > moment. I suggest we use the King git repo <https://github.com/king/bravo> > for > now to track some features before it is in a shape that deserves a Flink > PR. We are super happy to take any improvements, code contributions from > anyone so dont hesitate to reach out to me if you have some ideas. > > Gyula > > > Rong Rong <walter...@gmail.com> ezt írta (időpont: 2018. aug. 22., Sze, > 17:06): > > > +1. Being able to analyze the state is a huge operational advantage. > > Thanks Gyula for the POC and I would be very interested in contributing > to > > the work. > > > > -- > > Rong > > > > On Tue, Aug 21, 2018 at 4:26 AM Till Rohrmann <trohrm...@apache.org> > > wrote: > > > > > big +1 for this feature. A tool to get your state out of and into Flink > > > will be tremendously helpful. > > > > > > On Mon, Aug 20, 2018 at 10:21 AM Aljoscha Krettek <aljos...@apache.org > > > > > wrote: > > > > > > > +1 I'd like to have something like this in Flink a lot! > > > > > > > > > On 19. Aug 2018, at 11:57, Gyula Fóra <gyula.f...@gmail.com> > wrote: > > > > > > > > > > Hi all! > > > > > > > > > > Thanks for the feedback and I'm happy there is some interest :) > > > > > Tomorrow I will start improving the proposal based on the feedback > > and > > > > will > > > > > get back to work. > > > > > > > > > > If you are interested working together in this please ping me and > we > > > can > > > > > discuss some ideas/plans and how to share work. > > > > > > > > > > Cheers, > > > > > Gyula > > > > > > > > > > Paris Carbone <par...@kth.se> ezt írta (időpont: 2018. aug. 18., > > Szo, > > > > 9:03): > > > > > > > > > >> +1 > > > > >> > > > > >> Might also be a good start to implement queryable stream state > with > > > > >> snapshot isolation using that mechanism. > > > > >> > > > > >> Paris > > > > >> > > > > >>> On 17 Aug 2018, at 12:28, Gyula Fóra <gyula.f...@gmail.com> > wrote: > > > > >>> > > > > >>> Hi All! > > > > >>> > > > > >>> I want to share with you a little project we have been working on > > at > > > > King > > > > >>> (with some help from some dataArtisans folks). I think this would > > be > > > a > > > > >>> valuable addition to Flink and solve a bunch of outstanding > > > production > > > > >>> use-cases and headaches around state bootstrapping and state > > > analytics. > > > > >>> > > > > >>> We have built a quick and dirty POC implementation on top of > Flink > > > 1.6, > > > > >>> please check the README for some nice examples to get a quick > idea: > > > > >>> > > > > >>> https://github.com/king/bravo > > > > >>> > > > > >>> *Short story* > > > > >>> Bravo is a convenient state reader and writer library leveraging > > the > > > > >>> Flink’s batch processing capabilities. It supports processing and > > > > writing > > > > >>> Flink streaming savepoints. At the moment it only supports > > processing > > > > >>> RocksDB savepoints but this can be extended in the future for > other > > > > state > > > > >>> backends and checkpoint types. > > > > >>> > > > > >>> Our goal is to cover a few basic features: > > > > >>> > > > > >>> - Converting keyed states to Flink DataSets for processing and > > > > >> analytics > > > > >>> - Reading/Writing non-keyed operators states > > > > >>> - Bootstrap keyed states from Flink DataSets and create new > valid > > > > >>> savepoints > > > > >>> - Transform existing savepoints by replacing/changing some > states > > > > >>> > > > > >>> > > > > >>> Some example use-cases: > > > > >>> > > > > >>> - Point-in-time state analytics across all operators and keys > > > > >>> - Bootstrap state of a streaming job from external resources > such > > as > > > > >>> reading from database/filesystem > > > > >>> - Validate and potentially repair corrupted state of a streaming > > job > > > > >>> - Change max parallelism of a job > > > > >>> > > > > >>> > > > > >>> Our main goal is to start working together with other Flink > > > production > > > > >>> users and make this something useful that can be part of Flink. > So > > if > > > > you > > > > >>> have use-cases please talk to us :) > > > > >>> I have also started a google doc which contains a little bit more > > > info > > > > >> than > > > > >>> the readme and could be a starting place for discussions: > > > > >>> > > > > >>> > > > > >> > > > > > > > > > > https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpwdhqBMr-ppkFL5E/edit?usp=sharing > > > > >>> > > > > >>> I know there are a bunch of rough edges and bugs (and no tests) > but > > > our > > > > >>> motto is: If you are not embarrassed, you released too late :) > > > > >>> > > > > >>> Please let me know what you think! > > > > >>> > > > > >>> Cheers, > > > > >>> Gyula > > > > >> > > > > >> > > > > > > > > > > > > > > -- "So you have to trust that the dots will somehow connect in your future."