Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Suneel Marthi Tue, 19 Apr 2016 08:21:05 -0700

On Tue, Apr 19, 2016 at 11:08 AM, Khurrum Nasim <[email protected]>
wrote:


> Thank you Dimitry.
>
> So is there an architectural blueprint for mahout ?   What I mean is how
> can get the 1000 feet overview ? Or the bird eye view of the project.
> I do see Mahout is very modularized - however I’m still trying to make
> heads and tails out it :)
>
> @Dimitry -
> "my investigation points that  there are architectural problems in spark
> that
> are hard to overcome at this point for high IO algorithms.”  - Can you
> share some more details about this - I’m just curious.
>

Long story short - "Distributed != Scalable"

>
>
> > On Apr 18, 2016, at 8:18 PM, Dmitriy Lyubimov <[email protected]> wrote:
> >
> > Khurrum,
> >
> > mahout is so much  a library at this point.
> >
> > if you mean if it can be used to build networks with 2d inputs, yes i did
> > some of that. multi-epoch SGD based systems should be easy enough to
> build,
> > and will probably have a reasonable performance -- although I think
> > dedicated CNN systems like Caffe would still run faster at this point.
> Full
> > batch trainers are somewhat slow for larger problems though, my
> > investigation points that  there are architectural problems in spark that
> > are hard to overcome at this point for high IO algorithms.
> >
> > On Mon, Apr 18, 2016 at 11:49 AM, Khurrum Nasim <
> [email protected]>
> > wrote:
> >
> >> Hi Guys,
> >>
> >> Can Mahout be used for things like face detection ?    Also which unit
> >> tests or integration tests do you recommend I should run just to get a
> >> better feel of the execution flow.
> >>
> >> I’m still slowly acclimating to the project.  But hopefully should come
> up
> >> to speed soon.
> >>
> >>
> >> Many Thanks,
> >>
> >> Khurrum
> >>
> >>
> >>
> >>
> >>> On Mar 30, 2016, at 3:10 PM, Suneel Marthi <[email protected]> wrote:
> >>>
> >>> Thanks Khurrum for stepping up.
> >>>
> >>> You just need basic programming skills - Java/Scala to be able to
> >>> contribute. We can help you with the algorithms and linear algebra
> stuff.
> >>>
> >>>
> >>> Welcome aboard !!
> >>>
> >>>
> >>> On Wed, Mar 30, 2016 at 3:05 PM, Khurrum Nasim <
> [email protected]
> >>>
> >>> wrote:
> >>>
> >>>> Thanks for the advice Dimitry.  I’m already signed up on ASF jira.
> My
> >>>> handle is “nasimk”
> >>>>
> >>>> Do I need to be a linear algebra expert and or math phd  to
> contribute ?
> >>>> I have 10 plus years of computer programming experience.  my
> background
> >> is
> >>>> comp sci.
> >>>>
> >>>> Khurrum
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> On Mar 30, 2016, at 2:57 PM, Dmitriy Lyubimov <[email protected]>
> >> wrote:
> >>>>>
> >>>>> PS You may also want to sign up with ASF Jira so we can assign issues
> >> to
> >>>>> yourself.
> >>>>>
> >>>>> On Wed, Mar 30, 2016 at 11:52 AM, Dmitriy Lyubimov <
> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Mar 30, 2016 at 11:43 AM, Khurrum Nasim <
> >>>> [email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks Dimirtry.
> >>>>>>>
> >>>>>>> I take a look at see where I can start pitching in.  Do I need
> >>>>>>> contributor access ? how  would I create feature branch of my work
> ?
> >>>>>>>
> >>>>>>
> >>>>>> Khurrum,
> >>>>>>
> >>>>>> you only need github account. What you need is to create mahout's
> >> master
> >>>>>> fork in your github space and keep it in sync, as possible, with
> >> master
> >>>> as
> >>>>>> you go (by doing regular pulls). That way you have the most chance
> of
> >>>>>> having least conflicts possible.
> >>>>>>
> >>>>>> At any point in time (I recommend at perhaps when you feel you are
> >> about
> >>>>>> 50 to 70% done or just need a code advice), you can create a github
> >> pull
> >>>>>> request to the apache/mahout master. Make sure to include MAHOUT-XXX
> >>>> issue
> >>>>>> in the head of the pull request, that way ASF will automatically
> >>>> propagate
> >>>>>> code comments to jira, and so all discussion can be done entirely on
> >>>> github.
> >>>>>>
> >>>>>> Again, if you take on a signficant contribution (such as a new
> >> numerical
> >>>>>> method contribution), I recommend to discuss the proposal on the
> @dev
> >>>> list
> >>>>>>
> >>>>>> thanks.
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Khurrum
> >>>>>>>
> >>>>>>>> On Mar 30, 2016, at 1:12 PM, Dmitriy Lyubimov <[email protected]>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Oh but of course! please do!
> >>>>>>>>
> >>>>>>>> You may work on any issue, this or any other of your choice, or
> even
> >>>> on
> >>>>>>> any
> >>>>>>>> new issue you can think of (for sizeable contributions it is
> >>>>>>> recommended to
> >>>>>>>> start discussion on the @dev list first though, to make sure to
> >>>> benefit
> >>>>>>>> from experience of others. Please file any new issue first to
> jira).
> >>>>>>>>
> >>>>>>>> On Wed, Mar 30, 2016 at 9:05 AM, shashi bushan dongur (JIRA) <
> >>>>>>>> [email protected]> wrote:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> [
> >>>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://issues.apache.org/jira/browse/MAHOUT-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218216#comment-15218216
> >>>>>>>>> ]
> >>>>>>>>>
> >>>>>>>>> shashi bushan dongur commented on MAHOUT-1788:
> >>>>>>>>> ----------------------------------------------
> >>>>>>>>>
> >>>>>>>>> Hello. I would like to start contributing to mahout. Can I work
> on
> >>>> this
> >>>>>>>>> issue?
> >>>>>>>>>
> >>>>>>>>>> spark-itemsimilarity integration test script cleanup
> >>>>>>>>>> ----------------------------------------------------
> >>>>>>>>>>
> >>>>>>>>>>             Key: MAHOUT-1788
> >>>>>>>>>>             URL:
> >>>> https://issues.apache.org/jira/browse/MAHOUT-1788
> >>>>>>>>>>         Project: Mahout
> >>>>>>>>>>      Issue Type: Improvement
> >>>>>>>>>>      Components: cooccurrence
> >>>>>>>>>> Affects Versions: 0.11.0
> >>>>>>>>>>        Reporter: Pat Ferrel
> >>>>>>>>>>        Assignee: Pat Ferrel
> >>>>>>>>>>        Priority: Trivial
> >>>>>>>>>>         Fix For: 1.0.0
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> binary release does not contain data for itemsimilarity tests,
> >> neith
> >>>>>>>>> binary nor source versions will run on a cluster unless data is
> >> hand
> >>>>>>> copied
> >>>>>>>>> to hdfs.
> >>>>>>>>>> Clean this up so it copies data if needed and the data is in
> both
> >>>>>>>>> versions.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> This message was sent by Atlassian JIRA
> >>>>>>>>> (v6.3.4#6332)
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: [jira] [Commented] (MAHOUT-1788) spark-itemsimilarity integration test script cleanup

Reply via email to