Hi Gyula, +1 for this feature. State bootstrapping and state analytics will be very helpful.
On Thu, Aug 23, 2018 at 4:09 AM Gyula Fóra <gyula.f...@gmail.com> wrote: > Hi Shuyi, > > The tool allows you to convert a Flink DataSet containing the individual > state rows, (key, state) pairs, into a state for a streaming operator. > > So if you want to bootstrap your state from data on HDFS, you would read > the file in Flink, convert it to the required DataSet format, then you can > write the state for the desired operator. > As you can read many different input formats in Flink this gives large > flexibility to where you can bootstrap the state from. > > Gyula > > Shuyi Chen <suez1...@gmail.com> ezt írta (időpont: 2018. aug. 22., Sze, > 21:28): > > > +1 on the tooling. Also, you mentioned about state bootstrapping problem. > > Could you please elaborate on how we can leverage the tooling to solve > > state bootstrapping? I think this is a common problem to stream > processing, > > and it will be great the community can work on it. Thanks. > > > > Shuyi > > > > On Wed, Aug 22, 2018 at 11:51 AM Gyula Fóra <gyula.f...@gmail.com> > wrote: > > > > > Thanks, > > > > > > I guess the first thing that would be great help from anyone interested > > in > > > helping is to try it for some streaming state :) > > > > > > We have tested these tools at King to analyze, transform and perform > some > > > aggregations on our user-states. The major limitation is that it > requires > > > RocksDB savepoints to work but other than that we successfully > analyzed a > > > few hundred gigabytes of state including reading keyed, and broadcast > > > states from different operators. Also you need to have a savepoint > before > > > you can create a new savepoint (with whatever state). > > > > > > Once we have some people who have played with it we can probably > greatly > > > improve the API and user experience as it is pretty low level at the > > > moment. I suggest we use the King git repo < > > https://github.com/king/bravo> > > > for > > > now to track some features before it is in a shape that deserves a > Flink > > > PR. We are super happy to take any improvements, code contributions > from > > > anyone so dont hesitate to reach out to me if you have some ideas. > > > > > > Gyula > > > > > > > > > Rong Rong <walter...@gmail.com> ezt írta (időpont: 2018. aug. 22., > Sze, > > > 17:06): > > > > > > > +1. Being able to analyze the state is a huge operational advantage. > > > > Thanks Gyula for the POC and I would be very interested in > contributing > > > to > > > > the work. > > > > > > > > -- > > > > Rong > > > > > > > > On Tue, Aug 21, 2018 at 4:26 AM Till Rohrmann <trohrm...@apache.org> > > > > wrote: > > > > > > > > > big +1 for this feature. A tool to get your state out of and into > > Flink > > > > > will be tremendously helpful. > > > > > > > > > > On Mon, Aug 20, 2018 at 10:21 AM Aljoscha Krettek < > > aljos...@apache.org > > > > > > > > > wrote: > > > > > > > > > > > +1 I'd like to have something like this in Flink a lot! > > > > > > > > > > > > > On 19. Aug 2018, at 11:57, Gyula Fóra <gyula.f...@gmail.com> > > > wrote: > > > > > > > > > > > > > > Hi all! > > > > > > > > > > > > > > Thanks for the feedback and I'm happy there is some interest :) > > > > > > > Tomorrow I will start improving the proposal based on the > > feedback > > > > and > > > > > > will > > > > > > > get back to work. > > > > > > > > > > > > > > If you are interested working together in this please ping me > and > > > we > > > > > can > > > > > > > discuss some ideas/plans and how to share work. > > > > > > > > > > > > > > Cheers, > > > > > > > Gyula > > > > > > > > > > > > > > Paris Carbone <par...@kth.se> ezt írta (időpont: 2018. aug. > 18., > > > > Szo, > > > > > > 9:03): > > > > > > > > > > > > > >> +1 > > > > > > >> > > > > > > >> Might also be a good start to implement queryable stream state > > > with > > > > > > >> snapshot isolation using that mechanism. > > > > > > >> > > > > > > >> Paris > > > > > > >> > > > > > > >>> On 17 Aug 2018, at 12:28, Gyula Fóra <gyula.f...@gmail.com> > > > wrote: > > > > > > >>> > > > > > > >>> Hi All! > > > > > > >>> > > > > > > >>> I want to share with you a little project we have been > working > > on > > > > at > > > > > > King > > > > > > >>> (with some help from some dataArtisans folks). I think this > > would > > > > be > > > > > a > > > > > > >>> valuable addition to Flink and solve a bunch of outstanding > > > > > production > > > > > > >>> use-cases and headaches around state bootstrapping and state > > > > > analytics. > > > > > > >>> > > > > > > >>> We have built a quick and dirty POC implementation on top of > > > Flink > > > > > 1.6, > > > > > > >>> please check the README for some nice examples to get a quick > > > idea: > > > > > > >>> > > > > > > >>> https://github.com/king/bravo > > > > > > >>> > > > > > > >>> *Short story* > > > > > > >>> Bravo is a convenient state reader and writer library > > leveraging > > > > the > > > > > > >>> Flink’s batch processing capabilities. It supports processing > > and > > > > > > writing > > > > > > >>> Flink streaming savepoints. At the moment it only supports > > > > processing > > > > > > >>> RocksDB savepoints but this can be extended in the future for > > > other > > > > > > state > > > > > > >>> backends and checkpoint types. > > > > > > >>> > > > > > > >>> Our goal is to cover a few basic features: > > > > > > >>> > > > > > > >>> - Converting keyed states to Flink DataSets for processing > and > > > > > > >> analytics > > > > > > >>> - Reading/Writing non-keyed operators states > > > > > > >>> - Bootstrap keyed states from Flink DataSets and create new > > > valid > > > > > > >>> savepoints > > > > > > >>> - Transform existing savepoints by replacing/changing some > > > states > > > > > > >>> > > > > > > >>> > > > > > > >>> Some example use-cases: > > > > > > >>> > > > > > > >>> - Point-in-time state analytics across all operators and > keys > > > > > > >>> - Bootstrap state of a streaming job from external resources > > > such > > > > as > > > > > > >>> reading from database/filesystem > > > > > > >>> - Validate and potentially repair corrupted state of a > > streaming > > > > job > > > > > > >>> - Change max parallelism of a job > > > > > > >>> > > > > > > >>> > > > > > > >>> Our main goal is to start working together with other Flink > > > > > production > > > > > > >>> users and make this something useful that can be part of > Flink. > > > So > > > > if > > > > > > you > > > > > > >>> have use-cases please talk to us :) > > > > > > >>> I have also started a google doc which contains a little bit > > more > > > > > info > > > > > > >> than > > > > > > >>> the readme and could be a starting place for discussions: > > > > > > >>> > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpwdhqBMr-ppkFL5E/edit?usp=sharing > > > > > > >>> > > > > > > >>> I know there are a bunch of rough edges and bugs (and no > tests) > > > but > > > > > our > > > > > > >>> motto is: If you are not embarrassed, you released too late > :) > > > > > > >>> > > > > > > >>> Please let me know what you think! > > > > > > >>> > > > > > > >>> Cheers, > > > > > > >>> Gyula > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > "So you have to trust that the dots will somehow connect in your future." > > >