Re: New Flink team member - Kate Eri.

Trevor Grant Fri, 10 Feb 2017 06:45:08 -0800

Also RE: DL4J integration.

Suneel had done some work on this a while back, and ran into issues.  You
might want to chat with him about the pitfalls and 'gotchyas' there.




Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Feb 10, 2017 at 7:37 AM, Trevor Grant <trevor.d.gr...@gmail.com>
wrote:

> Sorry for chiming in late.
>
> GPUs on Flink.  Till raised a good point- you need to be able to fall back
> to non-GPU resources if they aren't available.
>
> Fun fact: this has already been developed for Flink vis-a-vis the Apache
> Mahout project.
>
> In short- Mahout exposes a number of tensor functions (vector %*% matrix,
> matrix %*% matrix, etc).  If compiled for GPU support, those operations are
> completed via GPU- and if no GPUs are in fact available, Mahout math falls
> back to CPUs (and finally back to the JVM).
>
> How this should work is Flink takes care of shipping data around the
> cluster, and when data arrives at the local node- is dumped out to GPU for
> calculation, loaded back up and shipped back around cluster.  In practice,
> the lack of a persist method for intermediate results makes this
> troublesome (not because of GPUs but for calculating any sort of complex
> algorithm we expect to be able to cache intermediate results).
>
> +1 to FLINK-1730
>
> Everything in Mahout is modular- distributed engine
> (Flink/Spark/Write-your-own), Native Solvers (OpenMP / ViennaCL / CUDA /
> Write-your-own), algorithms, etc.
>
> So to sum up, you're noting the redundancy between ML packages in terms of
> algorithms- I would recommend checking out Mahout before rolling your own
> GPU integration (else risk redundantly integrating GPUs). If nothing else-
> it should give you some valuable insight regarding design considerations.
> Also FYI the goal of the Apache Mahout project is to address that problem
> precisely- implement an algorithm once in a mathematically expressive DSL,
> which is abstracted above the engine so the same code easily ports between
> engines / native solvers (i.e. CPU/GPU).
>
> https://github.com/apache/mahout/tree/master/viennacl-omp
> https://github.com/apache/mahout/tree/master/viennacl
>
> Best,
> tg
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Fri, Feb 10, 2017 at 7:01 AM, Katherin Eri <katherinm...@gmail.com>
> wrote:
>
>> Thank you Felix, for provided information.
>>
>> Currently I analyze the provided integration of Flink with SystemML.
>>
>> And also gather the information for the ticket  FLINK-1730
>> <https://issues.apache.org/jira/browse/FLINK-1730>, maybe we will take it
>> to work, to unlock SystemML/Flink integration.
>>
>>
>>
>> чт, 9 февр. 2017 г. в 0:17, Felix Neutatz <neut...@googlemail.com.invali
>> d>:
>>
>> > Hi Kate,
>> >
>> > 1) - Broadcast:
>> >
>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-5%3A+
>> Only+send+data+to+each+taskmanager+once+for+broadcasts
>> >  - Caching: https://issues.apache.org/jira/browse/FLINK-1730
>> >
>> > 2) I have no idea about the GPU implementation. The SystemML mailing
>> list
>> > will probably help you out their.
>> >
>> > Best regards,
>> > Felix
>> >
>> > 2017-02-08 14:33 GMT+01:00 Katherin Eri <katherinm...@gmail.com>:
>> >
>> > > Thank you Felix, for your point, it is quite interesting.
>> > >
>> > > I will take a look at the code, of the provided Flink integration.
>> > >
>> > > 1)    You have these problems with Flink: >>we realized that the lack
>> of
>> > a
>> > > caching operator and a broadcast issue highly effects the performance,
>> > have
>> > > you already asked about this the community? In case yes: please
>> provide
>> > the
>> > > reference to the ticket or the topic of letter.
>> > >
>> > > 2)    You have said, that SystemML provides GPU support. I have seen
>> > > SystemML’s source code and would like to ask: why you have decided to
>> > > implement your own integration with cuda? Did you try to consider
>> ND4J,
>> > or
>> > > because it is younger, you support your own implementation?
>> > >
>> > > вт, 7 февр. 2017 г. в 18:35, Felix Neutatz <neut...@googlemail.com>:
>> > >
>> > > > Hi Katherin,
>> > > >
>> > > > we are also working in a similar direction. We implemented a
>> prototype
>> > to
>> > > > integrate with SystemML:
>> > > > https://github.com/apache/incubator-systemml/pull/119
>> > > > SystemML provides many different matrix formats, operations, GPU
>> > support
>> > > > and a couple of DL algorithms. Unfortunately, we realized that the
>> lack
>> > > of
>> > > > a caching operator and a broadcast issue highly effects the
>> performance
>> > > > (e.g. compared to Spark). At the moment I am trying to tackle the
>> > > broadcast
>> > > > issue. But caching is still a problem for us.
>> > > >
>> > > > Best regards,
>> > > > Felix
>> > > >
>> > > > 2017-02-07 16:22 GMT+01:00 Katherin Eri <katherinm...@gmail.com>:
>> > > >
>> > > > > Thank you, Till.
>> > > > >
>> > > > > 1)      Regarding ND4J, I didn’t know about such a pity and
>> critical
>> > > > > restriction of it -> lack of sparsity optimizations, and you are
>> > right:
>> > > > > this issue is still actual for them. I saw that Flink uses Breeze,
>> > but
>> > > I
>> > > > > thought its usage caused by some historical reasons.
>> > > > >
>> > > > > 2)      Regarding integration with DL4J, I have read the source
>> code
>> > of
>> > > > > DL4J/Spark integration, that’s why I have declined my idea of
>> reuse
>> > of
>> > > > > their word2vec implementation for now, for example. I can perform
>> > > deeper
>> > > > > investigation of this topic, if it required.
>> > > > >
>> > > > >
>> > > > >
>> > > > > So I feel that we have the following picture:
>> > > > >
>> > > > > 1)      DL integration investigation, could be part of Apache
>> Bahir.
>> > I
>> > > > can
>> > > > > perform futher investigation of this topic, but I thik we need
>> some
>> > > > > separated ticket for this to track this activity.
>> > > > >
>> > > > > 2)      GPU support, required for DL is interesting, but requires
>> > ND4J
>> > > > for
>> > > > > example.
>> > > > >
>> > > > > 3)      ND4J couldn’t be incorporated because it doesn’t support
>> > > sparsity
>> > > > > <https://deeplearning4j.org/roadmap.html> [1].
>> > > > >
>> > > > > Regarding ND4J is this the single blocker for incorporation of it
>> or
>> > > may
>> > > > be
>> > > > > some others known?
>> > > > >
>> > > > >
>> > > > > [1] https://deeplearning4j.org/roadmap.html
>> > > > >
>> > > > > вт, 7 февр. 2017 г. в 16:26, Till Rohrmann <trohrm...@apache.org
>> >:
>> > > > >
>> > > > > Thanks for initiating this discussion Katherin. I think you're
>> right
>> > > that
>> > > > > in general it does not make sense to reinvent the wheel over and
>> over
>> > > > > again. Especially if you only have limited resources at hand. So
>> if
>> > we
>> > > > > could integrate Flink with some existing library that would be
>> great.
>> > > > >
>> > > > > In the past, however, we couldn't find a good library which
>> provided
>> > > > enough
>> > > > > freedom to integrate it with Flink. Especially if you want to have
>> > > > > distributed and somewhat high-performance implementations of ML
>> > > > algorithms
>> > > > > you would have to take Flink's execution model (capabilities as
>> well
>> > as
>> > > > > limitations) into account. That is mainly the reason why we
>> started
>> > > > > implementing some of the algorithms "natively" on Flink.
>> > > > >
>> > > > > If I remember correctly, then the problem with ND4J was and still
>> is
>> > > that
>> > > > > it does not support sparse matrices which was a requirement from
>> our
>> > > > side.
>> > > > > As far as I know, it is quite common that you have sparse data
>> > > structures
>> > > > > when dealing with large scale problems. That's why we built our
>> own
>> > > > > abstraction which can have different implementations. Currently,
>> the
>> > > > > default implementation uses Breeze.
>> > > > >
>> > > > > I think the support for GPU based operations and the actual
>> resource
>> > > > > management are two orthogonal things. The implementation would
>> have
>> > to
>> > > > work
>> > > > > with no GPUs available anyway. If the system detects that GPUs are
>> > > > > available, then ideally it would exploit them. Thus, we could add
>> > this
>> > > > > feature later and maybe integrate it with FLINK-5131 [1].
>> > > > >
>> > > > > Concerning the integration with DL4J I think that Theo's proposal
>> to
>> > do
>> > > > it
>> > > > > in a separate repository (maybe as part of Apache Bahir) is a good
>> > > idea.
>> > > > > We're currently thinking about outsourcing some of Flink's
>> libraries
>> > > into
>> > > > > sub projects. This could also be an option for the DL4J
>> integration
>> > > then.
>> > > > > In general I think it should be feasible to run DL4J on Flink
>> given
>> > > that
>> > > > it
>> > > > > also runs on Spark. Have you already looked at it closer?
>> > > > >
>> > > > > [1] https://issues.apache.org/jira/browse/FLINK-5131
>> > > > >
>> > > > > Cheers,
>> > > > > Till
>> > > > >
>> > > > > On Tue, Feb 7, 2017 at 11:47 AM, Katherin Eri <
>> > katherinm...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Thank you Theodore, for your reply.
>> > > > > >
>> > > > > > 1)    Regarding GPU, your point is clear and I agree with it,
>> ND4J
>> > > > looks
>> > > > > > appropriate. But, my current understanding is that, we also
>> need to
>> > > > cover
>> > > > > > some resource management questions -> when we need to provide
>> GPU
>> > > > support
>> > > > > > we also need to manage it like resource. For example, Mesos has
>> > > already
>> > > > > > supported GPU like resource item: Initial support for GPU
>> > resources.
>> > > > > > <
>> > https://issues.apache.org/jira/browse/MESOS-4424?jql=text%20~%20GPU
>> > > >
>> > > > > > Flink
>> > > > > > uses Mesos as cluster manager, and this means that this feature
>> of
>> > > > Mesos
>> > > > > > could be reused. Also memory managing questions in Flink
>> regarding
>> > > GPU
>> > > > > > should be clarified.
>> > > > > >
>> > > > > > 2)    Regarding integration with DL4J: what stops us to
>> initialize
>> > > > ticket
>> > > > > > and start the discussion around this topic? We need some user
>> story
>> > > or
>> > > > > the
>> > > > > > community is not sure that DL is really helpful? Why the
>> discussion
>> > > > with
>> > > > > > Adam
>> > > > > > Gibson just finished with no implementation of any idea? What
>> > > concerns
>> > > > do
>> > > > > > we have?
>> > > > > >
>> > > > > > пн, 6 февр. 2017 г. в 15:01, Theodore Vasiloudis <
>> > > > > > theodoros.vasilou...@gmail.com>:
>> > > > > >
>> > > > > > > Hell all,
>> > > > > > >
>> > > > > > > This is point that has come up in the past: Given the
>> multitude
>> > of
>> > > ML
>> > > > > > > libraries out there, should we have native implementations in
>> > > FlinkML
>> > > > > or
>> > > > > > > try to integrate other libraries instead?
>> > > > > > >
>> > > > > > > We haven't managed to reach a consensus on this before. My
>> > opinion
>> > > is
>> > > > > > that
>> > > > > > > there is definitely value in having ML algorithms written
>> > natively
>> > > in
>> > > > > > > Flink, both for performance optimization,
>> > > > > > > but more importantly for engineering simplicity, we don't
>> want to
>> > > > force
>> > > > > > > users to use yet another piece of software to run their ML
>> algos
>> > > (at
>> > > > > > least
>> > > > > > > for a basic set of algorithms).
>> > > > > > >
>> > > > > > > We have in the past  discussed integrations with DL4J
>> > (particularly
>> > > > > ND4J)
>> > > > > > > with Adam Gibson, the core developer of the library, but we
>> never
>> > > got
>> > > > > > > around to implementing anything.
>> > > > > > >
>> > > > > > > Whether it makes sense to have an integration with DL4J as
>> part
>> > of
>> > > > the
>> > > > > > > Flink distribution would be up for discussion. I would
>> suggest to
>> > > > make
>> > > > > it
>> > > > > > > an independent repo to allow for
>> > > > > > > faster dev/release cycles, and because it wouldn't be directly
>> > > > related
>> > > > > to
>> > > > > > > the core of Flink so it would add extra reviewing burden to an
>> > > > already
>> > > > > > > overloaded group of committers.
>> > > > > > >
>> > > > > > > Natively supporting GPU calculations in Flink would be much
>> > better
>> > > > > > achieved
>> > > > > > > through a library like ND4J, the engineering burden would be
>> too
>> > > much
>> > > > > > > otherwise.
>> > > > > > >
>> > > > > > > Regards,
>> > > > > > > Theodore
>> > > > > > >
>> > > > > > > On Mon, Feb 6, 2017 at 11:26 AM, Katherin Eri <
>> > > > katherinm...@gmail.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hello, guys.
>> > > > > > > >
>> > > > > > > > Theodore, last week I started the review of the PR:
>> > > > > > > > https://github.com/apache/flink/pull/2735 related to
>> *word2Vec
>> > > for
>> > > > > > > Flink*.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > During this review I have asked myself: why do we need to
>> > > implement
>> > > > > > such
>> > > > > > > a
>> > > > > > > > very popular algorithm like *word2vec one more time*, when
>> > there
>> > > is
>> > > > > > > already
>> > > > > > > > available implementation in java provided by
>> > deeplearning4j.org
>> > > > > > > > <https://deeplearning4j.org/word2vec> library (DL4J ->
>> Apache
>> > 2
>> > > > > > > licence).
>> > > > > > > > This library tries to promote itself, there is a hype
>> around it
>> > > in
>> > > > ML
>> > > > > > > > sphere, and it was integrated with Apache Spark, to provide
>> > > > scalable
>> > > > > > > > deeplearning calculations.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > *That's why I thought: could we integrate with this library
>> or
>> > > not
>> > > > > also
>> > > > > > > and
>> > > > > > > > Flink? *
>> > > > > > > >
>> > > > > > > > 1) Personally I think, providing support and deployment of
>> > > > > > > > *Deeplearning(DL)
>> > > > > > > > algorithms/models in Flink* is promising and attractive
>> > feature,
>> > > > > > because:
>> > > > > > > >
>> > > > > > > >     a) during last two years DL proved its efficiency and
>> these
>> > > > > > > algorithms
>> > > > > > > > used in many applications. For example *Spotify *uses DL
>> based
>> > > > > > algorithms
>> > > > > > > > for music content extraction: Recommending music on Spotify
>> > with
>> > > > deep
>> > > > > > > > learning AUGUST 05, 2014
>> > > > > > > > <http://benanne.github.io/2014/08/05/spotify-cnns.html> for
>> > > their
>> > > > > > music
>> > > > > > > > recommendations. Developers need to scale up DL manually,
>> that
>> > > > causes
>> > > > > a
>> > > > > > > lot
>> > > > > > > > of work, so that’s why such platforms like Flink should
>> support
>> > > > these
>> > > > > > > > models deployment.
>> > > > > > > >
>> > > > > > > >     b) Here is presented the scope of Deeplearning usage
>> cases
>> > > > > > > > <https://deeplearning4j.org/use_cases>, so many of this
>> > > scenarios
>> > > > > > > related
>> > > > > > > > to scenarios, that could be supported on Flink.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > 2) But DL uncover such questions like:
>> > > > > > > >
>> > > > > > > >     a) scale up calculations over machines
>> > > > > > > >
>> > > > > > > >     b) perform these calculations both over CPU and GPU.
>> GPU is
>> > > > > > required
>> > > > > > > to
>> > > > > > > > train big DL models, otherwise learning process could have
>> very
>> > > > slow
>> > > > > > > > convergence.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > 3) I have checked this DL4J library, which already have
>> reach
>> > > > support
>> > > > > > of
>> > > > > > > > many attractive DL models like: Recurrent Networks and
>> LSTMs,
>> > > > > > > Convolutional
>> > > > > > > > Networks (CNN), Restricted Boltzmann Machines (RBM) and
>> others.
>> > > So
>> > > > we
>> > > > > > > won’t
>> > > > > > > > need to implement them independently, but only provide the
>> > > ability
>> > > > of
>> > > > > > > > execution of this models over Flink cluster, the quite
>> similar
>> > > way
>> > > > > like
>> > > > > > > it
>> > > > > > > > was integrated with Apache Spark.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Because of all of this I propose:
>> > > > > > > >
>> > > > > > > > 1)    To create new ticket in Flink’s JIRA for integration
>> of
>> > > Flink
>> > > > > > with
>> > > > > > > > DL4J and decide on which side this integration should be
>> > > > implemented.
>> > > > > > > >
>> > > > > > > > 2)    Support natively GPU resources in Flink and allow
>> > > > calculations
>> > > > > > over
>> > > > > > > > them, like that is described in this publication
>> > > > > > > > https://www.oreilly.com/learning/accelerating-spark-
>> > > > > > workloads-using-gpus
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > *Regarding original issue Implement Word2Vec
>> > > > > > > > <https://issues.apache.org/jira/browse/FLINK-2094>in Flink,
>> > *I
>> > > > have
>> > > > > > > > investigated its implementation in DL4J and  that
>> > implementation
>> > > of
>> > > > > > > > integration DL4J with Apache Spark, and got several points:
>> > > > > > > >
>> > > > > > > > It seems that idea of building of our own implementation of
>> > > > word2vec
>> > > > > in
>> > > > > > > > Flink not such a bad solution, because: This DL4J was
>> forced to
>> > > > > > > reimplement
>> > > > > > > > its original word2Vec over Spark. I have checked the
>> > integration
>> > > of
>> > > > > > DL4J
>> > > > > > > > with Spark, and found that it is too strongly coupled with
>> > Spark
>> > > > API,
>> > > > > > so
>> > > > > > > > that it is impossible just to take some DL4J API and reuse
>> it,
>> > > > > instead
>> > > > > > we
>> > > > > > > > need to implement independent integration for Flink.
>> > > > > > > >
>> > > > > > > > *That’s why we simply finish implementation of current PR
>> > > > > > > > **independently **from
>> > > > > > > > integration to DL4J.*
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Could you please provide your opinion regarding my questions
>> > and
>> > > > > > points,
>> > > > > > > > what do you think about them?
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > пн, 6 февр. 2017 г. в 12:51, Katherin Eri <
>> > > katherinm...@gmail.com
>> > > > >:
>> > > > > > > >
>> > > > > > > > > Sorry, guys I need to finish this letter first.
>> > > > > > > > >   Full version of it will come shortly.
>> > > > > > > > >
>> > > > > > > > > пн, 6 февр. 2017 г. в 12:49, Katherin Eri <
>> > > > katherinm...@gmail.com
>> > > > > >:
>> > > > > > > > >
>> > > > > > > > > Hello, guys.
>> > > > > > > > > Theodore, last week I started the review of the PR:
>> > > > > > > > > https://github.com/apache/flink/pull/2735 related to
>> > *word2Vec
>> > > > for
>> > > > > > > > Flink*.
>> > > > > > > > >
>> > > > > > > > > During this review I have asked myself: why do we need to
>> > > > implement
>> > > > > > > such
>> > > > > > > > a
>> > > > > > > > > very popular algorithm like *word2vec one more time*, when
>> > > there
>> > > > is
>> > > > > > > > > already availabe implementation in java provided by
>> > > > > > deeplearning4j.org
>> > > > > > > > > <https://deeplearning4j.org/word2vec> library (DL4J ->
>> > Apache
>> > > 2
>> > > > > > > > licence).
>> > > > > > > > > This library tries to promote it self, there is a hype
>> around
>> > > it
>> > > > in
>> > > > > > ML
>> > > > > > > > > sphere, and  it was integrated with Apache Spark, to
>> provide
>> > > > > scalable
>> > > > > > > > > deeplearning calculations.
>> > > > > > > > > That's why I thought: could we integrate with this
>> library or
>> > > not
>> > > > > > also
>> > > > > > > > and
>> > > > > > > > > Flink?
>> > > > > > > > > 1) Personally I think, providing support and deployment of
>> > > > > > Deeplearning
>> > > > > > > > > algorithms/models in Flink is promising and attractive
>> > feature,
>> > > > > > > because:
>> > > > > > > > >     a) during last two years deeplearning proved its
>> > efficiency
>> > > > and
>> > > > > > > this
>> > > > > > > > > algorithms used in many applications. For example *Spotify
>> > > *uses
>> > > > DL
>> > > > > > > based
>> > > > > > > > > algorithms for music content extraction: Recommending
>> music
>> > on
>> > > > > > Spotify
>> > > > > > > > > with deep learning AUGUST 05, 2014
>> > > > > > > > > <http://benanne.github.io/2014/08/05/spotify-cnns.html>
>> for
>> > > > their
>> > > > > > > music
>> > > > > > > > > recommendations. Doing this natively scalable is very
>> > > attractive.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > I have investigated that implementation of integration
>> DL4J
>> > > with
>> > > > > > Apache
>> > > > > > > > > Spark, and got several points:
>> > > > > > > > >
>> > > > > > > > > 1) It seems that idea of building of our own
>> implementation
>> > of
>> > > > > > word2vec
>> > > > > > > > > not such a bad solution, because the integration of DL4J
>> with
>> > > > Spark
>> > > > > > is
>> > > > > > > > too
>> > > > > > > > > strongly coupled with Saprk API and it will take time from
>> > the
>> > > > side
>> > > > > > of
>> > > > > > > > DL4J
>> > > > > > > > > to adopt this integration to Flink. Also I have expected
>> that
>> > > we
>> > > > > will
>> > > > > > > be
>> > > > > > > > > able to call just some API, it is not such thing.
>> > > > > > > > > 2)
>> > > > > > > > >
>> > > > > > > > > https://deeplearning4j.org/use_cases
>> > > > > > > > > https://www.analyticsvidhya.com/blog/2017/01/t-sne-
>> > > > > > > > implementation-r-python/
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > чт, 19 янв. 2017 г. в 13:29, Till Rohrmann <
>> > > trohrm...@apache.org
>> > > > >:
>> > > > > > > > >
>> > > > > > > > > Hi Katherin,
>> > > > > > > > >
>> > > > > > > > > welcome to the Flink community. Always great to see new
>> > people
>> > > > > > joining
>> > > > > > > > the
>> > > > > > > > > community :-)
>> > > > > > > > >
>> > > > > > > > > Cheers,
>> > > > > > > > > Till
>> > > > > > > > >
>> > > > > > > > > On Tue, Jan 17, 2017 at 1:02 PM, Katherin Sotenko <
>> > > > > > > > katherinm...@gmail.com>
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > ok, I've got it.
>> > > > > > > > > > I will take a look at
>> > > > https://github.com/apache/flink/pull/2735
>> > > > > .
>> > > > > > > > > >
>> > > > > > > > > > вт, 17 янв. 2017 г. в 14:36, Theodore Vasiloudis <
>> > > > > > > > > > theodoros.vasilou...@gmail.com>:
>> > > > > > > > > >
>> > > > > > > > > > > Hello Katherin,
>> > > > > > > > > > >
>> > > > > > > > > > > Welcome to the Flink community!
>> > > > > > > > > > >
>> > > > > > > > > > > The ML component definitely needs a lot of work you
>> are
>> > > > > correct,
>> > > > > > we
>> > > > > > > > are
>> > > > > > > > > > > facing similar problems to CEP, which we'll hopefully
>> > > resolve
>> > > > > > with
>> > > > > > > > the
>> > > > > > > > > > > restructuring Stephan has mentioned in that thread.
>> > > > > > > > > > >
>> > > > > > > > > > > If you'd like to help out with PRs we have many open,
>> > one I
>> > > > > have
>> > > > > > > > > started
>> > > > > > > > > > > reviewing but got side-tracked is the Word2Vec one
>> [1].
>> > > > > > > > > > >
>> > > > > > > > > > > Best,
>> > > > > > > > > > > Theodore
>> > > > > > > > > > >
>> > > > > > > > > > > [1] https://github.com/apache/flink/pull/2735
>> > > > > > > > > > >
>> > > > > > > > > > > On Tue, Jan 17, 2017 at 12:17 PM, Fabian Hueske <
>> > > > > > fhue...@gmail.com
>> > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hi Katherin,
>> > > > > > > > > > > >
>> > > > > > > > > > > > welcome to the Flink community!
>> > > > > > > > > > > > Help with reviewing PRs is always very welcome and a
>> > > great
>> > > > > way
>> > > > > > to
>> > > > > > > > > > > > contribute.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Best, Fabian
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > 2017-01-17 11:17 GMT+01:00 Katherin Sotenko <
>> > > > > > > > katherinm...@gmail.com
>> > > > > > > > > >:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Thank you, Timo.
>> > > > > > > > > > > > > I have started the analysis of the topic.
>> > > > > > > > > > > > > And if it necessary, I will try to perform the
>> review
>> > > of
>> > > > > > other
>> > > > > > > > > pulls)
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > вт, 17 янв. 2017 г. в 13:09, Timo Walther <
>> > > > > > twal...@apache.org
>> > > > > > > >:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hi Katherin,
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > great to hear that you would like to contribute!
>> > > > Welcome!
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > I gave you contributor permissions. You can now
>> > > assign
>> > > > > > issues
>> > > > > > > > to
>> > > > > > > > > > > > > > yourself. I assigned FLINK-1750 to you.
>> > > > > > > > > > > > > > Right now there are many open ML pull requests,
>> you
>> > > are
>> > > > > > very
>> > > > > > > > > > welcome
>> > > > > > > > > > > to
>> > > > > > > > > > > > > > review the code of others, too.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Timo
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Am 17/01/17 um 10:39 schrieb Katherin Sotenko:
>> > > > > > > > > > > > > > > Hello, All!
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > I'm Kate Eri, I'm java developer with 6-year
>> > > > enterprise
>> > > > > > > > > > experience,
>> > > > > > > > > > > > > also
>> > > > > > > > > > > > > > I
>> > > > > > > > > > > > > > > have some expertise with scala (half of the
>> > year).
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Last 2 years I have participated in several
>> > BigData
>> > > > > > > projects
>> > > > > > > > > that
>> > > > > > > > > > > > were
>> > > > > > > > > > > > > > > related to Machine Learning (Time series
>> > analysis,
>> > > > > > > > Recommender
>> > > > > > > > > > > > systems,
>> > > > > > > > > > > > > > > Social networking) and ETL. I have experience
>> > with
>> > > > > > Hadoop,
>> > > > > > > > > Apache
>> > > > > > > > > > > > Spark
>> > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > Hive.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > I’m fond of ML topic, and I see that Flink
>> > project
>> > > > > > requires
>> > > > > > > > > some
>> > > > > > > > > > > work
>> > > > > > > > > > > > > in
>> > > > > > > > > > > > > > > this area, so that’s why I would like to join
>> > Flink
>> > > > and
>> > > > > > ask
>> > > > > > > > me
>> > > > > > > > > to
>> > > > > > > > > > > > grant
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > assignment of the ticket
>> > > > > > > > > > > > > > https://issues.apache.org/jira
>> /browse/FLINK-1750
>> > > > > > > > > > > > > > > to me.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: New Flink team member - Kate Eri.

Reply via email to