This is a great development! I have wanted Beam to have a library of sketches.
What Eugene is referring to is the fact that you can write Combine.perKey(combineFn) to use these in a transform but also StateSpecs.combiningState(combineFn) to use them in a stateful ParDo. So it is good to make the CombineFn public and refine their constructors to be user-friendly. Kenn On Fri, Aug 4, 2017 at 7:45 AM, Arnaud Fournier <[email protected] > wrote: > Thanks for your comments, that is very encouraging ! > > I have created a Jira : https://issues.apache.org/jira/browse/BEAM-2728 > and a PR : https://github.com/apache/beam/pull/3686 > > Eugene and Lucas I saw that you already have some ideas so I put you as > reviewers, > I look forward to hear more from you. > > With Ismael and JB, we already thought about using some of these indicators > as metric cells, > as it can be useful for some kinds of monitoring. > But I have never heard about state cells, is it something like the > QuantileState in ApproximateQuantiles ? > > > > 2017-08-04 3:14 GMT+02:00 Anand Iyer <[email protected]>: > > > This is awesome!! Very exciting to see the addition of statistical and > > data-mining algorithms to Apache Beam. > > > > On Thu, Aug 3, 2017 at 2:32 PM, Eugene Kirpichov < > > [email protected]> wrote: > > > > > +1, Very exciting! I have some suggestions on the exact API to expose > > (e.g. > > > I think it makes sense to expose the CombineFn's directly, so that they > > can > > > also be used for combining state cells and not just as PTransforms), > but > > > that can be handled during regular code review. > > > > > > On Thu, Aug 3, 2017 at 2:23 PM Sourabh Bajaj > > > <[email protected]> wrote: > > > > > > > +1 to this. > > > > > > > > On Thu, Aug 3, 2017 at 6:28 AM Lukasz Cwik <[email protected] > > > > > > wrote: > > > > > > > > > I'm most interested in the frequency / cardinality tools as it > could > > be > > > > > used to help improve performance automatically for combiners by > > > detecting > > > > > the few keys case or automatically handle hot keys without needing > > > users > > > > to > > > > > specify the hints when they use a combiner. > > > > > > > > > > On Thu, Aug 3, 2017 at 5:35 AM, Jean-Baptiste Onofré < > > [email protected]> > > > > > wrote: > > > > > > > > > > > Nice work Arnaud ;) > > > > > > > > > > > > Happy to have been able to help. > > > > > > > > > > > > Let's see what the others will think about this. > > > > > > > > > > > > Regards > > > > > > JB > > > > > > > > > > > > > > > > > > On 08/03/2017 02:32 PM, Arnaud Fournier wrote: > > > > > > > > > > > >> Hello everyone, > > > > > >> > > > > > >> My name is Arnaud Fournier and I am a CS student. I am currently > > > doing > > > > > an > > > > > >> internship at Talend. > > > > > >> > > > > > >> With the support of Jean-Baptiste Onofre and Ismaël Mejia, I > have > > > been > > > > > >> working on statistical analysis of streams with Beam, using > > > > > probabilistic > > > > > >> data structures like HyperLogLog. > > > > > >> > > > > > >> I would like to share this work with the community, but I wanted > > > first > > > > > to > > > > > >> show you my work in progress and ask you if this humble > > contribution > > > > > could > > > > > >> be interesting as an extension. > > > > > >> > > > > > >> I have made a little doc with more details about what I have > done > > in > > > > > case > > > > > >> you are interested and want to give me some feedback : > > > > > >> *https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUe > > > > > >> usiwL0Jo2ACI5PEOP1kc/edit* > > > > > >> <https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUe > > > > > >> usiwL0Jo2ACI5PEOP1kc/edit> > > > > > >> > > > > > >> You can also find the current work implementation in progress > here > > > : > > > > > >> > > > > > >> https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/e > > > > > >> xtensions/sketching > > > > > >> > > > > > >> > > > > > >> <https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/ > > > > > >> extensions/sketching> > > > > > >> > > > > > >> Thanks ! > > > > > >> > > > > > >> Arnaud > > > > > >> > > > > > >> > > > > > > -- > > > > > > Jean-Baptiste Onofré > > > > > > [email protected] > > > > > > http://blog.nanthrax.net > > > > > > Talend - http://www.talend.com > > > > > > > > > > > > > > > > > > > > >
