Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

Maciej Mazur Wed, 19 Mar 2014 15:20:00 -0700

Ok, I think you are right.
Although it would be a valuable experience, I will have to leave it.
Thanks for your feedback.
I understand that is not the best use of map reduce.
I'm not going to propose this project.
Now this issue can be closed.



On Wed, Mar 19, 2014 at 11:01 PM, Ted Dunning <[email protected]> wrote:

> I really think that a true downpour architecture is actually easier than
> what you suggest and much better for the purpose.
>
>
>
>
> On Wed, Mar 19, 2014 at 1:28 PM, Maciej Mazur <[email protected]
> >wrote:
>
> > Any comments?
> > I think it will work. If I will do one long lasting job, hack the file
> > system from mapper in order to repeateadly update weights, perform mini
> > batch GD, and store updates in some folder.
> > In the background I could call small jobs for gathering gradients and
> > updating weights.
> >
> >
> > On Tue, Mar 18, 2014 at 10:11 PM, Maciej Mazur <[email protected]
> > >wrote:
> >
> > > I'll say what I think about it.
> > >
> > > I know that mahout is currently heading in different direction. You are
> > > working on refactoring, improving existing api and migrating to Spark.
> I
> > > know that there is a great deal of work to do there. I would also like
> to
> > > help with that.
> > >
> > > I am impressed by results achieved by using Neural Networks. Generally
> > > speaking I think that NN give significant advantage over other methods
> in
> > > wide range of problems. It beats other state of the art algorithms in
> > > various areas. I think that in the future this algorithm will play even
> > > greater role.
> > > That's why I came up with an idea to implement neural networks.
> > >
> > > When it comes to functionality: pretraining (RBM), training
> > (SGD/minibatch
> > > gradient descent + backpropagation + momentum) and classification.
> > >
> > > Unfortunately mapreduce is illsuited for NNs.
> > > The biggest problem is how to reduce the number of iterations.
> > > It is possible to divide data and use momentum applied to edges - it
> > helps
> > > a little, but doesn't solve the problem.
> > >
> > > I've some idea of not exactly mapreduce implementation. But I am not
> sure
> > > whether it is possible using this infrastructure. For sure it is not
> > plain
> > > map reduce.
> > > In other distributed NNs implementation there are asynchronic
> operations.
> > > Is it possible to take adventage of asynchrony?
> > > At first I would separate data, some subset on every node.
> > > On each node I will use a number of files (directories) for storing
> > > weights.
> > > Each machile will use these files to count the cost function and update
> > > gradient.
> > > In the background multiple reduce job will average gradients for some
> > > subsets of weights (one file).
> > > Then asynchronously update some subset of weights (from one file).
> > > In a way this idea is similar to Downpour SGD from
> > >
> >
> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf
> > >
> > > There are couple of problems here. Is it a feasible solution?
> > >
> > >
> > > Parallel implementation is very complex. It's hard to design something
> > > that uses mapreduce, but it's not a mapreduce algorithm.
> > > Definitely you are more experienced than me and I'll need a lot of
> help.
> > I
> > > may not be aware of some limitations.
> > >
> > > From my perspective it would be a great experience, even if I could do
> > > something other than NNs. Frankly speaking I think I'll stay here
> > > regardless of whether my propasal will be accepted. It'll be a great
> > > opportunity to learn.
> > >
> > >
> > >
> > >
> > > On Mon, Mar 17, 2014 at 5:27 AM, Suneel Marthi <
> [email protected]
> > >wrote:
> > >
> > >> I would suggest looking at deeplearning4j.org (they went public very
> > >> recently) and see how they had utilized Iterative Reduce for
> > implementing
> > >> Neural Nets.
> > >>
> > >> Not sure given the present state of flux on the project if we should
> > even
> > >> be considering adding any new algorithms. The existing ones can be
> > >> refactored to be more API driven (for both clustering and
> > classification)
> > >> and that's no trivial effort and could definitely use lot of help.
> > >>
> > >> How is what u r proposing gonna be any better than similar existing
> > >> implementations that Mahout
> > >> already has both in terms of functionality and performance, scaling ?
> > >> Are there users who
> > >> would prefer whatever u r proposing as opposed to using what already
> > >> exists in Mahout?
> > >>
> > >> We did purge a lot of the unmaintained and non-functional code for the
> > >> 0.9 release and are down to where we r today. There's still room for
> > >> improvement in what presently exists and the project could definitely
> > use
> > >> some help there.
> > >>
> > >> With the emphasis now on supporting Spark ASAP, any new
> implementations
> > >> would not make the task any easier.  There's still stuff in Mahout
> Math
> > >> that can be redone to be more flexible like the present Named Vector
> > (See
> > >> Mahout-1236). That's a very high priority for the next release, and is
> > >> gonna impact existing implementations once finalized. The present
> > codebase
> > >> is very heavily dependent on M/R, decoupling the relevant pieces from
> MR
> > >> api and being able to offer a potential Mahout user the choice of
> > different
> > >> execution engines (Spark or MR) is no trivial task.
> > >>
> > >> IMO, the emphasis should now be more on stabilizing, refactoring and
> > >> cleaning up the existing implementations (which is technical debt
> that's
> > >> building up) and porting stuff to Spark.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Sunday, March 16, 2014 4:39 PM, Ted Dunning <[email protected]
> >
> > >> wrote:
> > >>
> > >> OK.
> > >>
> > >> I am confused now as well.
> > >>
> > >> Even so, I would recommend that you propose a non-map-reduce but still
> > >> parallel version.
> > >>
> > >> Some of the confusion may stem from the fact that you can design some
> > >> non-map-reduce programs to run in such a way that a map-reduce
> execution
> > >> framework like Hadoop thinks that they are doing map-reduce.  Instead,
> > >> these programs are doing whatever they feel like and just pretending
> to
> > be
> > >> map-reduce programs in order to get a bunch of processes launched.
> > >>
> > >>
> > >>
> > >>
> > >> On Sun, Mar 16, 2014 at 1:27 PM, Maciej Mazur <[email protected]
> > >> >wrote:
> > >>
> > >> > I have
> > >>  one final question.
> > >> >
> > >> > I've mixed feelings about this discussion.
> > >> > You are saying that there is no point in doing mapreduce
> > implementation
> > >> of
> > >> > neural netoworks (with pretraining).
> > >> > Then you are thinking that non map reduce would of substatial
> > interest.
> > >> > On the other hand you say that it would be easy and it beats the
> > >> purpose of
> > >> > doing it of doing it on mahout (because it is not a mr version).
> > >> > Finally you are saying that building something simple and working
> is a
> > >> good
> > >> > thing.
> > >> >
> > >> > I do not really know what to think about it.
> > >> > Could you give me some advice whether I should write a proposal or
> > not?
> > >> > (And if I should: Should I propose MapReduce or not MapReduce
> verison?
> > >> > There is
> > >>  already NN algorithm but
> > >>  without pretraining.)
> > >> >
> > >> > Thanks,
> > >> > Maciej Mazur
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Feb 28, 2014 at 5:44 AM, peng <[email protected]> wrote:
> > >> >
> > >> > > Oh, thanks a lot, I missed that one :)
> > >> > > +1 on easiest one implemented first. I haven't think about
> > difficulty
> > >> > > issue, need  to read more about YARN extension.
> > >> > >
> > >> > > Yours Peng
> > >> > >
> > >> > >
> > >> > > On Thu 27 Feb 2014 08:06:27 PM EST, Yexi Jiang wrote:
> > >> > >
> > >> > >> Hi, Peng,
> > >> > >>
> > >> > >> Do you mean the MultilayerPerceptron? There are three 'train'
> > method,
> > >> > and
> > >> > >> only one (the one without the parameters trackingKey and
> groupKey)
> > is
> > >> > >> implemented. In current implementation, they are not used.
> > >> > >>
> > >> > >> Regards,
> > >> > >> Yexi
> > >> > >>
> > >> > >>
> > >> > >> 2014-02-27 19:31 GMT-05:00 Ted Dunning <[email protected]>:
> > >> > >>
> > >> > >>  Generally for training models like this, there is an assumption
> > that
> > >> > >>> fault
> > >> > >>> tolerance is not
> > >>  particularly necessary because the low risk of failure
> > >> > >>> trades against algorithmic speed.  For reasonably small chance
> of
> > >> > >>> failure,
> > >> > >>> simply re-running the training is just fine.  If there is high
> > risk
> > >> of
> > >> > >>> failure, simply checkpointing the parameter server is sufficient
> > to
> > >> > allow
> > >> > >>> restarts without redundancy.
> > >> > >>>
> > >> > >>> Sharding the parameter is quite possible and is reasonable when
> > the
> > >> > >>> parameter vector exceed 10's or 100's of millions of parameters,
> > but
> > >> > >>> isn't
> > >> > >>> likely much necessary below that.
> > >> > >>>
> > >> > >>> The asymmetry is similarly not a big
> > >>  deal.  The traffic to and from the
> > >> >
> > >>  >>> parameter server isn't enormous.
> > >> > >>>
> > >> > >>>
> > >> > >>> Building something simple and working first is a good thing.
> > >> > >>>
> > >> > >>>
> > >> > >>> On Thu, Feb 27, 2014 at 3:56 PM, peng <[email protected]>
> > wrote:
> > >> > >>>
> > >> > >>>  With pleasure! the original downpour paper propose a parameter
> > >> server
> > >> > >>>>
> > >> > >>> from
> > >> > >>>
> > >> > >>>> which subnodes download shards of old model and upload
> gradients.
> > >> So
> > >> > if
> > >> > >>>>
> > >> > >>>
> > >>  the
> > >> >
> > >>  >>>
> > >> > >>>> parameter server is down, the process has to be delayed, it
> also
> > >> > >>>> requires
> > >> > >>>> that all model parameters to be stored and atomically updated
> on
> > >> (and
> > >> > >>>> fetched from) a single machine, imposing asymmetric HDD and
> > >> bandwidth
> > >> > >>>> requirement. This design is necessary only because each -=delta
> > >> > >>>> operation
> > >> > >>>> has to be atomic. Which cannot be ensured across network (e.g.
> on
> > >> > HDFS).
> > >> > >>>>
> > >> > >>>> But it doesn't mean that the operation cannot be decentralized:
> > >> > >>>>
> > >> > >>> parameters
> > >> > >>>
> > >> > >>>> can be
> > >>  sharded across multiple nodes and multiple accumulator
> > >> > instances
> > >> > >>>>
> > >> > >>> can
> > >> > >>>
> > >> > >>>> handle parts of the vector subtraction. This should be easy if
> > you
> > >> > >>>>
> > >> > >>> create a
> > >> > >>>
> > >> > >>>> buffer for the stream of gradient, and allocate proper numbers
> of
> > >> > >>>>
> > >> > >>> producers
> > >> > >>>
> > >> > >>>> and consumers on each machine to make sure it doesn't overflow.
> > >> > >>>> Obviously
> > >> > >>>> this is far from MR framework, but at least it can be made
> > >> homogeneous
> > >> > >>>>
> > >> > >>>
> > >>  and
> > >> > >>>
> > >> > >>>> slightly faster (because sparse data can be distributed in a
> way
> > to
> > >> > >>>> minimize their overlapping, so gradients doesn't have to go
> > across
> > >> the
> > >> > >>>> network that frequent).
> > >> > >>>>
> > >> > >>>> If we instead using a centralized architect. Then there must be
> > >=1
> > >> > >>>>
> > >> > >>> backup
> > >> > >>>
> > >> > >>>> parameter server for mission critical training.
> > >> > >>>>
> > >> > >>>> Yours Peng
> > >> > >>>>
> > >> > >>>> e.g. we can simply use a producer/consumer pattern
> > >> > >>>>
> > >> > >>>> If we use a
> > >>  producer/consumer pattern for all gradients,
> > >> > >>>>
> > >> > >>>> On Thu 27 Feb 2014 05:09:52 PM EST, Yexi Jiang wrote:
> > >> > >>>>
> > >> > >>>>  Peng,
> > >> > >>>>>
> > >> > >>>>> Can you provide more details about your thought?
> > >> > >>>>>
> > >> > >>>>> Regards,
> > >> > >>>>>
> > >> > >>>>>
> > >> > >>>>> 2014-02-27 16:00 GMT-05:00 peng <[email protected]>:
> > >> > >>>>>
> > >> > >>>>>   That should be easy. But that defeats the purpose of using
> > >> mahout
> > >> > as
> > >> > >>>>>
> > >> > >>>>>> there
> > >> > >>>>>> are already enough implementations of single node
> > backpropagation
> > >> > (in
> > >> > >>>>>> which
> > >> > >>>>>> case GPU is much faster).
> > >> > >>>>>>
> > >> > >>>>>> Yexi:
> > >> > >>>>>>
> > >> > >>>>>> Regarding downpour SGD and sandblaster, may I suggest that
> the
> > >> > >>>>>> implementation better has no parameter server? It's
> obviously a
> > >> > single
> > >> > >>>>>> point of failure and in terms of bandwidth, a bottleneck. I
> > heard
> > >> > that
> > >> > >>>>>> MLlib on top of
> > >>  Spark has a functional
> > >>  implementation (never read or
> > >> > >>>>>>
> > >> > >>>>> test
> > >> > >>>
> > >> > >>>> it), and its possible to build the workflow on top of YARN. Non
> > of
> > >> > >>>>>>
> > >> > >>>>> those
> > >> > >>>
> > >> > >>>> framework has an heterogeneous topology.
> > >> > >>>>>>
> > >> > >>>>>> Yours Peng
> > >> > >>>>>>
> > >> > >>>>>>
> > >> > >>>>>> On Thu 27 Feb 2014 09:43:19 AM EST, Maciej Mazur (JIRA)
> wrote:
> > >> > >>>>>>
> > >> > >>>>>>
> > >> > >>>>>>         [
> > >> https://issues.apache.org/jira/browse/MAHOUT-1426?page=
> > >> > >>>>>>> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> > >> > >>>>>>> tabpanel&focusedCommentId=13913488#comment-13913488 ]
> > >> > >>>>>>>
> > >> > >>>>>>> Maciej Mazur edited comment on MAHOUT-1426 at 2/27/14 2:41
> PM:
> > >> > >>>>>>>
> > ---------------------------------------------------------------
> > >> > >>>>>>>
> > >> > >>>>>>> I've read the papers. I didn't think about distributed
> > network.
> > >> I
> > >> > had
> > >> > >>>>>>>
> > >> > >>>>>> in
> > >> >
> > >>  >>>
> > >> > >>>> mind network that will fit into memory, but will require
> > >> significant
> > >> > >>>>>>> amount
> > >> > >>>>>>> of computations.
> > >> > >>>>>>>
> > >> > >>>>>>> I understand that there are better options for neural
> networks
> > >> than
> > >> > >>>>>>>
> > >> > >>>>>> map
> > >> > >>>
> > >> > >>>> reduce.
> > >> > >>>>>>> How about non-map-reduce version?
> > >> > >>>>>>> I see that you think it is something that would make a
> sense.
> > >> > (Doing
> > >> > >>>>>>> a
> > >> > >>>>>>> non-map-reduce neural network in Mahout would be
> > >>  of
> > >>  substantial
> > >> > >>>>>>> interest.)
> > >> > >>>>>>> Do you think it will be a valueable contribution?
> > >> > >>>>>>> Is there a need for this type of algorithm?
> > >> > >>>>>>> I think about multi-threded batch gradient descent with
> > >> pretraining
> > >> > >>>>>>>
> > >> > >>>>>> (RBM
> > >> > >>>
> > >> > >>>> or/and Autoencoders).
> > >> > >>>>>>>
> > >> > >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> > >> > >>>>>>> "I would rather like to withdraw that patch, because by the
> > >> time i
> > >> > >>>>>>> implemented it i didn't know that the learning algorithm is
> > >>  not
> > >> > >>>>>>> suited
> > >> > >>>>>>> for
> > >> > >>>>>>> MR, so I think there is no point including the patch."
> > >> > >>>>>>>
> > >> > >>>>>>>
> > >> > >>>>>>> was (Author: maciejmazur):
> > >> > >>>>>>> I've read the papers. I didn't think about distributed
> > network.
> > >> I
> > >> > had
> > >> > >>>>>>>
> > >> > >>>>>> in
> > >> > >>>
> > >> > >>>> mind network that will fit into memory, but will require
> > >> significant
> > >> > >>>>>>> amount
> > >> > >>>>>>> of computations.
> > >> > >>>>>>>
> > >> > >>>>>>> I understand that there are better options for neural
> networks
> > >> than
> > >> > >>>>>>>
> > >> > >>>>>> map
> > >> > >>>
> > >> > >>>> reduce.
> > >> > >>>>>>> How about non-map-reduce version?
> > >> > >>>>>>> I see that you think it is something that would make a
> sense.
> > >> > >>>>>>> Do you think it will be a valueable contribution?
> > >> > >>>>>>> Is there a need for this type of algorithm?
> > >> > >>>>>>> I think about multi-threded batch gradient descent with
> > >> pretraining
> > >> > >>>>>>>
> > >> > >>>>>> (RBM
> > >> > >>>
> > >> >
> > >>  >>>> or/and Autoencoders).
> > >> > >>>>>>>
> > >> > >>>>>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> > >> > >>>>>>> "I would rather like to withdraw that patch, because by the
> > >> time i
> > >> > >>>>>>> implemented it i didn't know that the learning algorithm is
> > not
> > >> > >>>>>>> suited
> > >> > >>>>>>> for
> > >> > >>>>>>> MR, so I think there is no point including the patch."
> > >> > >>>>>>>
> > >> > >>>>>>>    GSOC 2013 Neural network algorithms
> > >> > >>>>>>>
> > >> > >>>>>>>  -----------------------------------
> > >> >
> > >>  >>>>>>>>
> > >> > >>>>>>>>                    Key: MAHOUT-1426
> > >> > >>>>>>>>                    URL: https://issues.apache.org/
> > >> > >>>>>>>> jira/browse/MAHOUT-1426
> > >> > >>>>>>>>                Project: Mahout
> > >> > >>>>>>>>             Issue Type: Improvement
> > >> > >>>>>>>>             Components: Classification
> > >> > >>>>>>>>
> > >>  Reporter: Maciej
> > >>  Mazur
> > >> > >>>>>>>>
> > >> > >>>>>>>> I would like to ask about possibilites of implementing
> neural
> > >> > >>>>>>>> network
> > >> > >>>>>>>> algorithms in mahout during GSOC.
> > >> > >>>>>>>> There is a classifier.mlp package with neural network.
> > >> > >>>>>>>> I can't see neighter RBM  nor Autoencoder in these classes.
> > >> > >>>>>>>> There is only one word about Autoencoders in NeuralNetwork
> > >> class.
> > >> > >>>>>>>> As far as I know Mahout doesn't support convolutional
> > networks.
> > >> > >>>>>>>> Is it a good idea to implement one of these algorithms?
> > >> > >>>>>>>> Is it a
> > >>  reasonable amount of work?
> > >> > >>>>>>>> How hard is it to get GSOC in Mahout?
> > >> > >>>>>>>> Did anyone succeed last year?
> > >> > >>>>>>>>
> > >> > >>>>>>>>
> > >> > >>>>>>>>
> > >> > >>>>>>>
> > >> > >>>>>>> --
> > >> > >>>>>>> This message was sent by Atlassian JIRA
> > >> > >>>>>>> (v6.1.5#6160)
> > >> > >>>>>>>
> > >> > >>>>>>>
> > >> > >>>>>>>
> > >> > >>>>>>
> > >> > >>>>>
> > >> > >>>>>
> > >> > >>>
> > >> >
> > >>  >>
> > >> > >>
> > >> > >>
> > >> >
> > >>
> > >
> > >
> >
>

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

Reply via email to