Also RE: DL4J integration. Suneel had done some work on this a while back, and ran into issues. You might want to chat with him about the pitfalls and 'gotchyas' there.
Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Fri, Feb 10, 2017 at 7:37 AM, Trevor Grant <trevor.d.gr...@gmail.com> wrote: > Sorry for chiming in late. > > GPUs on Flink. Till raised a good point- you need to be able to fall back > to non-GPU resources if they aren't available. > > Fun fact: this has already been developed for Flink vis-a-vis the Apache > Mahout project. > > In short- Mahout exposes a number of tensor functions (vector %*% matrix, > matrix %*% matrix, etc). If compiled for GPU support, those operations are > completed via GPU- and if no GPUs are in fact available, Mahout math falls > back to CPUs (and finally back to the JVM). > > How this should work is Flink takes care of shipping data around the > cluster, and when data arrives at the local node- is dumped out to GPU for > calculation, loaded back up and shipped back around cluster. In practice, > the lack of a persist method for intermediate results makes this > troublesome (not because of GPUs but for calculating any sort of complex > algorithm we expect to be able to cache intermediate results). > > +1 to FLINK-1730 > > Everything in Mahout is modular- distributed engine > (Flink/Spark/Write-your-own), Native Solvers (OpenMP / ViennaCL / CUDA / > Write-your-own), algorithms, etc. > > So to sum up, you're noting the redundancy between ML packages in terms of > algorithms- I would recommend checking out Mahout before rolling your own > GPU integration (else risk redundantly integrating GPUs). If nothing else- > it should give you some valuable insight regarding design considerations. > Also FYI the goal of the Apache Mahout project is to address that problem > precisely- implement an algorithm once in a mathematically expressive DSL, > which is abstracted above the engine so the same code easily ports between > engines / native solvers (i.e. CPU/GPU). > > https://github.com/apache/mahout/tree/master/viennacl-omp > https://github.com/apache/mahout/tree/master/viennacl > > Best, > tg > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > On Fri, Feb 10, 2017 at 7:01 AM, Katherin Eri <katherinm...@gmail.com> > wrote: > >> Thank you Felix, for provided information. >> >> Currently I analyze the provided integration of Flink with SystemML. >> >> And also gather the information for the ticket FLINK-1730 >> <https://issues.apache.org/jira/browse/FLINK-1730>, maybe we will take it >> to work, to unlock SystemML/Flink integration. >> >> >> >> чт, 9 февр. 2017 г. в 0:17, Felix Neutatz <neut...@googlemail.com.invali >> d>: >> >> > Hi Kate, >> > >> > 1) - Broadcast: >> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-5%3A+ >> Only+send+data+to+each+taskmanager+once+for+broadcasts >> > - Caching: https://issues.apache.org/jira/browse/FLINK-1730 >> > >> > 2) I have no idea about the GPU implementation. The SystemML mailing >> list >> > will probably help you out their. >> > >> > Best regards, >> > Felix >> > >> > 2017-02-08 14:33 GMT+01:00 Katherin Eri <katherinm...@gmail.com>: >> > >> > > Thank you Felix, for your point, it is quite interesting. >> > > >> > > I will take a look at the code, of the provided Flink integration. >> > > >> > > 1) You have these problems with Flink: >>we realized that the lack >> of >> > a >> > > caching operator and a broadcast issue highly effects the performance, >> > have >> > > you already asked about this the community? In case yes: please >> provide >> > the >> > > reference to the ticket or the topic of letter. >> > > >> > > 2) You have said, that SystemML provides GPU support. I have seen >> > > SystemML’s source code and would like to ask: why you have decided to >> > > implement your own integration with cuda? Did you try to consider >> ND4J, >> > or >> > > because it is younger, you support your own implementation? >> > > >> > > вт, 7 февр. 2017 г. в 18:35, Felix Neutatz <neut...@googlemail.com>: >> > > >> > > > Hi Katherin, >> > > > >> > > > we are also working in a similar direction. We implemented a >> prototype >> > to >> > > > integrate with SystemML: >> > > > https://github.com/apache/incubator-systemml/pull/119 >> > > > SystemML provides many different matrix formats, operations, GPU >> > support >> > > > and a couple of DL algorithms. Unfortunately, we realized that the >> lack >> > > of >> > > > a caching operator and a broadcast issue highly effects the >> performance >> > > > (e.g. compared to Spark). At the moment I am trying to tackle the >> > > broadcast >> > > > issue. But caching is still a problem for us. >> > > > >> > > > Best regards, >> > > > Felix >> > > > >> > > > 2017-02-07 16:22 GMT+01:00 Katherin Eri <katherinm...@gmail.com>: >> > > > >> > > > > Thank you, Till. >> > > > > >> > > > > 1) Regarding ND4J, I didn’t know about such a pity and >> critical >> > > > > restriction of it -> lack of sparsity optimizations, and you are >> > right: >> > > > > this issue is still actual for them. I saw that Flink uses Breeze, >> > but >> > > I >> > > > > thought its usage caused by some historical reasons. >> > > > > >> > > > > 2) Regarding integration with DL4J, I have read the source >> code >> > of >> > > > > DL4J/Spark integration, that’s why I have declined my idea of >> reuse >> > of >> > > > > their word2vec implementation for now, for example. I can perform >> > > deeper >> > > > > investigation of this topic, if it required. >> > > > > >> > > > > >> > > > > >> > > > > So I feel that we have the following picture: >> > > > > >> > > > > 1) DL integration investigation, could be part of Apache >> Bahir. >> > I >> > > > can >> > > > > perform futher investigation of this topic, but I thik we need >> some >> > > > > separated ticket for this to track this activity. >> > > > > >> > > > > 2) GPU support, required for DL is interesting, but requires >> > ND4J >> > > > for >> > > > > example. >> > > > > >> > > > > 3) ND4J couldn’t be incorporated because it doesn’t support >> > > sparsity >> > > > > <https://deeplearning4j.org/roadmap.html> [1]. >> > > > > >> > > > > Regarding ND4J is this the single blocker for incorporation of it >> or >> > > may >> > > > be >> > > > > some others known? >> > > > > >> > > > > >> > > > > [1] https://deeplearning4j.org/roadmap.html >> > > > > >> > > > > вт, 7 февр. 2017 г. в 16:26, Till Rohrmann <trohrm...@apache.org >> >: >> > > > > >> > > > > Thanks for initiating this discussion Katherin. I think you're >> right >> > > that >> > > > > in general it does not make sense to reinvent the wheel over and >> over >> > > > > again. Especially if you only have limited resources at hand. So >> if >> > we >> > > > > could integrate Flink with some existing library that would be >> great. >> > > > > >> > > > > In the past, however, we couldn't find a good library which >> provided >> > > > enough >> > > > > freedom to integrate it with Flink. Especially if you want to have >> > > > > distributed and somewhat high-performance implementations of ML >> > > > algorithms >> > > > > you would have to take Flink's execution model (capabilities as >> well >> > as >> > > > > limitations) into account. That is mainly the reason why we >> started >> > > > > implementing some of the algorithms "natively" on Flink. >> > > > > >> > > > > If I remember correctly, then the problem with ND4J was and still >> is >> > > that >> > > > > it does not support sparse matrices which was a requirement from >> our >> > > > side. >> > > > > As far as I know, it is quite common that you have sparse data >> > > structures >> > > > > when dealing with large scale problems. That's why we built our >> own >> > > > > abstraction which can have different implementations. Currently, >> the >> > > > > default implementation uses Breeze. >> > > > > >> > > > > I think the support for GPU based operations and the actual >> resource >> > > > > management are two orthogonal things. The implementation would >> have >> > to >> > > > work >> > > > > with no GPUs available anyway. If the system detects that GPUs are >> > > > > available, then ideally it would exploit them. Thus, we could add >> > this >> > > > > feature later and maybe integrate it with FLINK-5131 [1]. >> > > > > >> > > > > Concerning the integration with DL4J I think that Theo's proposal >> to >> > do >> > > > it >> > > > > in a separate repository (maybe as part of Apache Bahir) is a good >> > > idea. >> > > > > We're currently thinking about outsourcing some of Flink's >> libraries >> > > into >> > > > > sub projects. This could also be an option for the DL4J >> integration >> > > then. >> > > > > In general I think it should be feasible to run DL4J on Flink >> given >> > > that >> > > > it >> > > > > also runs on Spark. Have you already looked at it closer? >> > > > > >> > > > > [1] https://issues.apache.org/jira/browse/FLINK-5131 >> > > > > >> > > > > Cheers, >> > > > > Till >> > > > > >> > > > > On Tue, Feb 7, 2017 at 11:47 AM, Katherin Eri < >> > katherinm...@gmail.com> >> > > > > wrote: >> > > > > >> > > > > > Thank you Theodore, for your reply. >> > > > > > >> > > > > > 1) Regarding GPU, your point is clear and I agree with it, >> ND4J >> > > > looks >> > > > > > appropriate. But, my current understanding is that, we also >> need to >> > > > cover >> > > > > > some resource management questions -> when we need to provide >> GPU >> > > > support >> > > > > > we also need to manage it like resource. For example, Mesos has >> > > already >> > > > > > supported GPU like resource item: Initial support for GPU >> > resources. >> > > > > > < >> > https://issues.apache.org/jira/browse/MESOS-4424?jql=text%20~%20GPU >> > > > >> > > > > > Flink >> > > > > > uses Mesos as cluster manager, and this means that this feature >> of >> > > > Mesos >> > > > > > could be reused. Also memory managing questions in Flink >> regarding >> > > GPU >> > > > > > should be clarified. >> > > > > > >> > > > > > 2) Regarding integration with DL4J: what stops us to >> initialize >> > > > ticket >> > > > > > and start the discussion around this topic? We need some user >> story >> > > or >> > > > > the >> > > > > > community is not sure that DL is really helpful? Why the >> discussion >> > > > with >> > > > > > Adam >> > > > > > Gibson just finished with no implementation of any idea? What >> > > concerns >> > > > do >> > > > > > we have? >> > > > > > >> > > > > > пн, 6 февр. 2017 г. в 15:01, Theodore Vasiloudis < >> > > > > > theodoros.vasilou...@gmail.com>: >> > > > > > >> > > > > > > Hell all, >> > > > > > > >> > > > > > > This is point that has come up in the past: Given the >> multitude >> > of >> > > ML >> > > > > > > libraries out there, should we have native implementations in >> > > FlinkML >> > > > > or >> > > > > > > try to integrate other libraries instead? >> > > > > > > >> > > > > > > We haven't managed to reach a consensus on this before. My >> > opinion >> > > is >> > > > > > that >> > > > > > > there is definitely value in having ML algorithms written >> > natively >> > > in >> > > > > > > Flink, both for performance optimization, >> > > > > > > but more importantly for engineering simplicity, we don't >> want to >> > > > force >> > > > > > > users to use yet another piece of software to run their ML >> algos >> > > (at >> > > > > > least >> > > > > > > for a basic set of algorithms). >> > > > > > > >> > > > > > > We have in the past discussed integrations with DL4J >> > (particularly >> > > > > ND4J) >> > > > > > > with Adam Gibson, the core developer of the library, but we >> never >> > > got >> > > > > > > around to implementing anything. >> > > > > > > >> > > > > > > Whether it makes sense to have an integration with DL4J as >> part >> > of >> > > > the >> > > > > > > Flink distribution would be up for discussion. I would >> suggest to >> > > > make >> > > > > it >> > > > > > > an independent repo to allow for >> > > > > > > faster dev/release cycles, and because it wouldn't be directly >> > > > related >> > > > > to >> > > > > > > the core of Flink so it would add extra reviewing burden to an >> > > > already >> > > > > > > overloaded group of committers. >> > > > > > > >> > > > > > > Natively supporting GPU calculations in Flink would be much >> > better >> > > > > > achieved >> > > > > > > through a library like ND4J, the engineering burden would be >> too >> > > much >> > > > > > > otherwise. >> > > > > > > >> > > > > > > Regards, >> > > > > > > Theodore >> > > > > > > >> > > > > > > On Mon, Feb 6, 2017 at 11:26 AM, Katherin Eri < >> > > > katherinm...@gmail.com> >> > > > > > > wrote: >> > > > > > > >> > > > > > > > Hello, guys. >> > > > > > > > >> > > > > > > > Theodore, last week I started the review of the PR: >> > > > > > > > https://github.com/apache/flink/pull/2735 related to >> *word2Vec >> > > for >> > > > > > > Flink*. >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > During this review I have asked myself: why do we need to >> > > implement >> > > > > > such >> > > > > > > a >> > > > > > > > very popular algorithm like *word2vec one more time*, when >> > there >> > > is >> > > > > > > already >> > > > > > > > available implementation in java provided by >> > deeplearning4j.org >> > > > > > > > <https://deeplearning4j.org/word2vec> library (DL4J -> >> Apache >> > 2 >> > > > > > > licence). >> > > > > > > > This library tries to promote itself, there is a hype >> around it >> > > in >> > > > ML >> > > > > > > > sphere, and it was integrated with Apache Spark, to provide >> > > > scalable >> > > > > > > > deeplearning calculations. >> > > > > > > > >> > > > > > > > >> > > > > > > > *That's why I thought: could we integrate with this library >> or >> > > not >> > > > > also >> > > > > > > and >> > > > > > > > Flink? * >> > > > > > > > >> > > > > > > > 1) Personally I think, providing support and deployment of >> > > > > > > > *Deeplearning(DL) >> > > > > > > > algorithms/models in Flink* is promising and attractive >> > feature, >> > > > > > because: >> > > > > > > > >> > > > > > > > a) during last two years DL proved its efficiency and >> these >> > > > > > > algorithms >> > > > > > > > used in many applications. For example *Spotify *uses DL >> based >> > > > > > algorithms >> > > > > > > > for music content extraction: Recommending music on Spotify >> > with >> > > > deep >> > > > > > > > learning AUGUST 05, 2014 >> > > > > > > > <http://benanne.github.io/2014/08/05/spotify-cnns.html> for >> > > their >> > > > > > music >> > > > > > > > recommendations. Developers need to scale up DL manually, >> that >> > > > causes >> > > > > a >> > > > > > > lot >> > > > > > > > of work, so that’s why such platforms like Flink should >> support >> > > > these >> > > > > > > > models deployment. >> > > > > > > > >> > > > > > > > b) Here is presented the scope of Deeplearning usage >> cases >> > > > > > > > <https://deeplearning4j.org/use_cases>, so many of this >> > > scenarios >> > > > > > > related >> > > > > > > > to scenarios, that could be supported on Flink. >> > > > > > > > >> > > > > > > > >> > > > > > > > 2) But DL uncover such questions like: >> > > > > > > > >> > > > > > > > a) scale up calculations over machines >> > > > > > > > >> > > > > > > > b) perform these calculations both over CPU and GPU. >> GPU is >> > > > > > required >> > > > > > > to >> > > > > > > > train big DL models, otherwise learning process could have >> very >> > > > slow >> > > > > > > > convergence. >> > > > > > > > >> > > > > > > > >> > > > > > > > 3) I have checked this DL4J library, which already have >> reach >> > > > support >> > > > > > of >> > > > > > > > many attractive DL models like: Recurrent Networks and >> LSTMs, >> > > > > > > Convolutional >> > > > > > > > Networks (CNN), Restricted Boltzmann Machines (RBM) and >> others. >> > > So >> > > > we >> > > > > > > won’t >> > > > > > > > need to implement them independently, but only provide the >> > > ability >> > > > of >> > > > > > > > execution of this models over Flink cluster, the quite >> similar >> > > way >> > > > > like >> > > > > > > it >> > > > > > > > was integrated with Apache Spark. >> > > > > > > > >> > > > > > > > >> > > > > > > > Because of all of this I propose: >> > > > > > > > >> > > > > > > > 1) To create new ticket in Flink’s JIRA for integration >> of >> > > Flink >> > > > > > with >> > > > > > > > DL4J and decide on which side this integration should be >> > > > implemented. >> > > > > > > > >> > > > > > > > 2) Support natively GPU resources in Flink and allow >> > > > calculations >> > > > > > over >> > > > > > > > them, like that is described in this publication >> > > > > > > > https://www.oreilly.com/learning/accelerating-spark- >> > > > > > workloads-using-gpus >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > *Regarding original issue Implement Word2Vec >> > > > > > > > <https://issues.apache.org/jira/browse/FLINK-2094>in Flink, >> > *I >> > > > have >> > > > > > > > investigated its implementation in DL4J and that >> > implementation >> > > of >> > > > > > > > integration DL4J with Apache Spark, and got several points: >> > > > > > > > >> > > > > > > > It seems that idea of building of our own implementation of >> > > > word2vec >> > > > > in >> > > > > > > > Flink not such a bad solution, because: This DL4J was >> forced to >> > > > > > > reimplement >> > > > > > > > its original word2Vec over Spark. I have checked the >> > integration >> > > of >> > > > > > DL4J >> > > > > > > > with Spark, and found that it is too strongly coupled with >> > Spark >> > > > API, >> > > > > > so >> > > > > > > > that it is impossible just to take some DL4J API and reuse >> it, >> > > > > instead >> > > > > > we >> > > > > > > > need to implement independent integration for Flink. >> > > > > > > > >> > > > > > > > *That’s why we simply finish implementation of current PR >> > > > > > > > **independently **from >> > > > > > > > integration to DL4J.* >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > Could you please provide your opinion regarding my questions >> > and >> > > > > > points, >> > > > > > > > what do you think about them? >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > пн, 6 февр. 2017 г. в 12:51, Katherin Eri < >> > > katherinm...@gmail.com >> > > > >: >> > > > > > > > >> > > > > > > > > Sorry, guys I need to finish this letter first. >> > > > > > > > > Full version of it will come shortly. >> > > > > > > > > >> > > > > > > > > пн, 6 февр. 2017 г. в 12:49, Katherin Eri < >> > > > katherinm...@gmail.com >> > > > > >: >> > > > > > > > > >> > > > > > > > > Hello, guys. >> > > > > > > > > Theodore, last week I started the review of the PR: >> > > > > > > > > https://github.com/apache/flink/pull/2735 related to >> > *word2Vec >> > > > for >> > > > > > > > Flink*. >> > > > > > > > > >> > > > > > > > > During this review I have asked myself: why do we need to >> > > > implement >> > > > > > > such >> > > > > > > > a >> > > > > > > > > very popular algorithm like *word2vec one more time*, when >> > > there >> > > > is >> > > > > > > > > already availabe implementation in java provided by >> > > > > > deeplearning4j.org >> > > > > > > > > <https://deeplearning4j.org/word2vec> library (DL4J -> >> > Apache >> > > 2 >> > > > > > > > licence). >> > > > > > > > > This library tries to promote it self, there is a hype >> around >> > > it >> > > > in >> > > > > > ML >> > > > > > > > > sphere, and it was integrated with Apache Spark, to >> provide >> > > > > scalable >> > > > > > > > > deeplearning calculations. >> > > > > > > > > That's why I thought: could we integrate with this >> library or >> > > not >> > > > > > also >> > > > > > > > and >> > > > > > > > > Flink? >> > > > > > > > > 1) Personally I think, providing support and deployment of >> > > > > > Deeplearning >> > > > > > > > > algorithms/models in Flink is promising and attractive >> > feature, >> > > > > > > because: >> > > > > > > > > a) during last two years deeplearning proved its >> > efficiency >> > > > and >> > > > > > > this >> > > > > > > > > algorithms used in many applications. For example *Spotify >> > > *uses >> > > > DL >> > > > > > > based >> > > > > > > > > algorithms for music content extraction: Recommending >> music >> > on >> > > > > > Spotify >> > > > > > > > > with deep learning AUGUST 05, 2014 >> > > > > > > > > <http://benanne.github.io/2014/08/05/spotify-cnns.html> >> for >> > > > their >> > > > > > > music >> > > > > > > > > recommendations. Doing this natively scalable is very >> > > attractive. >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > I have investigated that implementation of integration >> DL4J >> > > with >> > > > > > Apache >> > > > > > > > > Spark, and got several points: >> > > > > > > > > >> > > > > > > > > 1) It seems that idea of building of our own >> implementation >> > of >> > > > > > word2vec >> > > > > > > > > not such a bad solution, because the integration of DL4J >> with >> > > > Spark >> > > > > > is >> > > > > > > > too >> > > > > > > > > strongly coupled with Saprk API and it will take time from >> > the >> > > > side >> > > > > > of >> > > > > > > > DL4J >> > > > > > > > > to adopt this integration to Flink. Also I have expected >> that >> > > we >> > > > > will >> > > > > > > be >> > > > > > > > > able to call just some API, it is not such thing. >> > > > > > > > > 2) >> > > > > > > > > >> > > > > > > > > https://deeplearning4j.org/use_cases >> > > > > > > > > https://www.analyticsvidhya.com/blog/2017/01/t-sne- >> > > > > > > > implementation-r-python/ >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > чт, 19 янв. 2017 г. в 13:29, Till Rohrmann < >> > > trohrm...@apache.org >> > > > >: >> > > > > > > > > >> > > > > > > > > Hi Katherin, >> > > > > > > > > >> > > > > > > > > welcome to the Flink community. Always great to see new >> > people >> > > > > > joining >> > > > > > > > the >> > > > > > > > > community :-) >> > > > > > > > > >> > > > > > > > > Cheers, >> > > > > > > > > Till >> > > > > > > > > >> > > > > > > > > On Tue, Jan 17, 2017 at 1:02 PM, Katherin Sotenko < >> > > > > > > > katherinm...@gmail.com> >> > > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > ok, I've got it. >> > > > > > > > > > I will take a look at >> > > > https://github.com/apache/flink/pull/2735 >> > > > > . >> > > > > > > > > > >> > > > > > > > > > вт, 17 янв. 2017 г. в 14:36, Theodore Vasiloudis < >> > > > > > > > > > theodoros.vasilou...@gmail.com>: >> > > > > > > > > > >> > > > > > > > > > > Hello Katherin, >> > > > > > > > > > > >> > > > > > > > > > > Welcome to the Flink community! >> > > > > > > > > > > >> > > > > > > > > > > The ML component definitely needs a lot of work you >> are >> > > > > correct, >> > > > > > we >> > > > > > > > are >> > > > > > > > > > > facing similar problems to CEP, which we'll hopefully >> > > resolve >> > > > > > with >> > > > > > > > the >> > > > > > > > > > > restructuring Stephan has mentioned in that thread. >> > > > > > > > > > > >> > > > > > > > > > > If you'd like to help out with PRs we have many open, >> > one I >> > > > > have >> > > > > > > > > started >> > > > > > > > > > > reviewing but got side-tracked is the Word2Vec one >> [1]. >> > > > > > > > > > > >> > > > > > > > > > > Best, >> > > > > > > > > > > Theodore >> > > > > > > > > > > >> > > > > > > > > > > [1] https://github.com/apache/flink/pull/2735 >> > > > > > > > > > > >> > > > > > > > > > > On Tue, Jan 17, 2017 at 12:17 PM, Fabian Hueske < >> > > > > > fhue...@gmail.com >> > > > > > > > >> > > > > > > > > > wrote: >> > > > > > > > > > > >> > > > > > > > > > > > Hi Katherin, >> > > > > > > > > > > > >> > > > > > > > > > > > welcome to the Flink community! >> > > > > > > > > > > > Help with reviewing PRs is always very welcome and a >> > > great >> > > > > way >> > > > > > to >> > > > > > > > > > > > contribute. >> > > > > > > > > > > > >> > > > > > > > > > > > Best, Fabian >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > 2017-01-17 11:17 GMT+01:00 Katherin Sotenko < >> > > > > > > > katherinm...@gmail.com >> > > > > > > > > >: >> > > > > > > > > > > > >> > > > > > > > > > > > > Thank you, Timo. >> > > > > > > > > > > > > I have started the analysis of the topic. >> > > > > > > > > > > > > And if it necessary, I will try to perform the >> review >> > > of >> > > > > > other >> > > > > > > > > pulls) >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > вт, 17 янв. 2017 г. в 13:09, Timo Walther < >> > > > > > twal...@apache.org >> > > > > > > >: >> > > > > > > > > > > > > >> > > > > > > > > > > > > > Hi Katherin, >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > great to hear that you would like to contribute! >> > > > Welcome! >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > I gave you contributor permissions. You can now >> > > assign >> > > > > > issues >> > > > > > > > to >> > > > > > > > > > > > > > yourself. I assigned FLINK-1750 to you. >> > > > > > > > > > > > > > Right now there are many open ML pull requests, >> you >> > > are >> > > > > > very >> > > > > > > > > > welcome >> > > > > > > > > > > to >> > > > > > > > > > > > > > review the code of others, too. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > Timo >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > Am 17/01/17 um 10:39 schrieb Katherin Sotenko: >> > > > > > > > > > > > > > > Hello, All! >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > I'm Kate Eri, I'm java developer with 6-year >> > > > enterprise >> > > > > > > > > > experience, >> > > > > > > > > > > > > also >> > > > > > > > > > > > > > I >> > > > > > > > > > > > > > > have some expertise with scala (half of the >> > year). >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Last 2 years I have participated in several >> > BigData >> > > > > > > projects >> > > > > > > > > that >> > > > > > > > > > > > were >> > > > > > > > > > > > > > > related to Machine Learning (Time series >> > analysis, >> > > > > > > > Recommender >> > > > > > > > > > > > systems, >> > > > > > > > > > > > > > > Social networking) and ETL. I have experience >> > with >> > > > > > Hadoop, >> > > > > > > > > Apache >> > > > > > > > > > > > Spark >> > > > > > > > > > > > > > and >> > > > > > > > > > > > > > > Hive. >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > I’m fond of ML topic, and I see that Flink >> > project >> > > > > > requires >> > > > > > > > > some >> > > > > > > > > > > work >> > > > > > > > > > > > > in >> > > > > > > > > > > > > > > this area, so that’s why I would like to join >> > Flink >> > > > and >> > > > > > ask >> > > > > > > > me >> > > > > > > > > to >> > > > > > > > > > > > grant >> > > > > > > > > > > > > > the >> > > > > > > > > > > > > > > assignment of the ticket >> > > > > > > > > > > > > > https://issues.apache.org/jira >> /browse/FLINK-1750 >> > > > > > > > > > > > > > > to me. >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >