Hello Kenneth, thank you for your answer. I read your blog post about stateful processing and that is indeed a great feature !
So if I understood correctly we could use the combineFns to declare combiningStates so it can be used while processing elements in a DoFn. That opens up a lot more use cases for the sketches ! Actually this was already possible for 2 sketches but now I refined the constructors of the 2 other sketches, and will do so for the other ones to come. Regards, Arnaud 2017-08-08 2:07 GMT+02:00 Kenneth Knowles <[email protected]>: > This is a great development! I have wanted Beam to have a library of > sketches. > > What Eugene is referring to is the fact that you can write > Combine.perKey(combineFn) to use these in a transform but also > StateSpecs.combiningState(combineFn) to use them in a stateful ParDo. So > it > is good to make the CombineFn public and refine their constructors to be > user-friendly. > > Kenn > > On Fri, Aug 4, 2017 at 7:45 AM, Arnaud Fournier < > [email protected] > > wrote: > > > Thanks for your comments, that is very encouraging ! > > > > I have created a Jira : https://issues.apache.org/jira/browse/BEAM-2728 > > and a PR : https://github.com/apache/beam/pull/3686 > > > > Eugene and Lucas I saw that you already have some ideas so I put you as > > reviewers, > > I look forward to hear more from you. > > > > With Ismael and JB, we already thought about using some of these > indicators > > as metric cells, > > as it can be useful for some kinds of monitoring. > > But I have never heard about state cells, is it something like the > > QuantileState in ApproximateQuantiles ? > > > > > > > > 2017-08-04 3:14 GMT+02:00 Anand Iyer <[email protected]>: > > > > > This is awesome!! Very exciting to see the addition of statistical and > > > data-mining algorithms to Apache Beam. > > > > > > On Thu, Aug 3, 2017 at 2:32 PM, Eugene Kirpichov < > > > [email protected]> wrote: > > > > > > > +1, Very exciting! I have some suggestions on the exact API to expose > > > (e.g. > > > > I think it makes sense to expose the CombineFn's directly, so that > they > > > can > > > > also be used for combining state cells and not just as PTransforms), > > but > > > > that can be handled during regular code review. > > > > > > > > On Thu, Aug 3, 2017 at 2:23 PM Sourabh Bajaj > > > > <[email protected]> wrote: > > > > > > > > > +1 to this. > > > > > > > > > > On Thu, Aug 3, 2017 at 6:28 AM Lukasz Cwik > <[email protected] > > > > > > > > wrote: > > > > > > > > > > > I'm most interested in the frequency / cardinality tools as it > > could > > > be > > > > > > used to help improve performance automatically for combiners by > > > > detecting > > > > > > the few keys case or automatically handle hot keys without > needing > > > > users > > > > > to > > > > > > specify the hints when they use a combiner. > > > > > > > > > > > > On Thu, Aug 3, 2017 at 5:35 AM, Jean-Baptiste Onofré < > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > Nice work Arnaud ;) > > > > > > > > > > > > > > Happy to have been able to help. > > > > > > > > > > > > > > Let's see what the others will think about this. > > > > > > > > > > > > > > Regards > > > > > > > JB > > > > > > > > > > > > > > > > > > > > > On 08/03/2017 02:32 PM, Arnaud Fournier wrote: > > > > > > > > > > > > > >> Hello everyone, > > > > > > >> > > > > > > >> My name is Arnaud Fournier and I am a CS student. I am > currently > > > > doing > > > > > > an > > > > > > >> internship at Talend. > > > > > > >> > > > > > > >> With the support of Jean-Baptiste Onofre and Ismaël Mejia, I > > have > > > > been > > > > > > >> working on statistical analysis of streams with Beam, using > > > > > > probabilistic > > > > > > >> data structures like HyperLogLog. > > > > > > >> > > > > > > >> I would like to share this work with the community, but I > wanted > > > > first > > > > > > to > > > > > > >> show you my work in progress and ask you if this humble > > > contribution > > > > > > could > > > > > > >> be interesting as an extension. > > > > > > >> > > > > > > >> I have made a little doc with more details about what I have > > done > > > in > > > > > > case > > > > > > >> you are interested and want to give me some feedback : > > > > > > >> *https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUe > > > > > > >> usiwL0Jo2ACI5PEOP1kc/edit* > > > > > > >> <https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUe > > > > > > >> usiwL0Jo2ACI5PEOP1kc/edit> > > > > > > >> > > > > > > >> You can also find the current work implementation in progress > > here > > > > : > > > > > > >> > > > > > > >> https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/e > > > > > > >> xtensions/sketching > > > > > > >> > > > > > > >> > > > > > > >> <https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/ > > > > > > >> extensions/sketching> > > > > > > >> > > > > > > >> Thanks ! > > > > > > >> > > > > > > >> Arnaud > > > > > > >> > > > > > > >> > > > > > > > -- > > > > > > > Jean-Baptiste Onofré > > > > > > > [email protected] > > > > > > > http://blog.nanthrax.net > > > > > > > Talend - http://www.talend.com > > > > > > > > > > > > > > > > > > > > > > > > > > > >
