Re: RE: A proposal for Spark 2.0

Guoqiang Li Thu, 12 Nov 2015 19:55:12 -0800

Yes, I agree with  Nan Zhu. I recommend these projects:
https://github.com/dmlc/ps-lite (Apache License 2)
https://github.com/Microsoft/multiverso (MIT License)



Alexander, You may also be interested in the demo(graph on parameter Server) 


https://github.com/witgo/zen/tree/ps_graphx/graphx/src/main/scala/com/github/cloudml/zen/graphx







------------------ Original ------------------
From:  "Ulanov, Alexander";<[email protected]>;
Date:  Fri, Nov 13, 2015 01:44 AM
To:  "Nan Zhu"<[email protected]>; "Guoqiang Li"<[email protected]>; 
Cc:  "[email protected]"<[email protected]>; "Reynold 
Xin"<[email protected]>; 
Subject:  RE: A proposal for Spark 2.0



  
Parameter Server is a new feature and thus does not match the goal of 2.0 is 
??to fix things that are broken in the current API and remove certain 
deprecated APIs??.  At the same time I would be happy to have that feature.
 
 
 
With regards to Machine learning, it would be great to move useful features 
from MLlib to ML and deprecate the former. Current structure of two separate 
machine  learning packages seems to be somewhat confusing.
 
With regards to GraphX, it would be great to deprecate the use of RDD in GraphX 
and switch to Dataframe. This will allow GraphX evolve with Tungsten.
 
 
 
Best regards, Alexander
 
 
 
From: Nan Zhu [mailto:[email protected]] 
 Sent: Thursday, November 12, 2015 7:28 AM
 To: [email protected]
 Cc: [email protected]
 Subject: Re: A proposal for Spark 2.0
 
 
  
Being specific to Parameter Server, I think the current agreement is that PS 
shall exist as a third-party library instead of a component of the core code 
base, isn??t?
 
  
 
 
  
Best,
 
   
 
 
  
-- 
 
  
Nan Zhu
 
  
http://codingcat.me
 
  
 
 
 
 
On Thursday, November 12, 2015 at 9:49 AM,  [email protected] wrote:
     
Who has the idea of machine learning? Spark missing some features for machine 
learning, For example, the parameter server.
 
  
 
 
  
 
 
    
?? 2015??11??12????05:32??Matei  Zaharia <[email protected]>  ??????
 
  
 
 
  
I like the idea of popping out Tachyon to an optional component too to reduce 
the number of dependencies. In the future, it might even be useful to do this 
for Hadoop, but it requires too many API changes to be worth doing now.
 
  
 
 
  
Regarding Scala 2.12, we should definitely support it eventually, but I don't 
think we need to block 2.0 on that because it can be added later too. Has 
anyone investigated what it would take to run on there? I imagine we don't need 
many  code changes, just maybe some REPL stuff.
 
  
 
 
  
Needless to say, but I'm all for the idea of making "major" releases as 
undisruptive as possible in the model Reynold proposed. Keeping everyone 
working with the same set of releases is super important.
 
  
 
 
  
Matei
 
  
 
 
    
On Nov 11, 2015, at 4:58 AM, Sean Owen <[email protected]> wrote:
 
  
 
 
  
On Wed, Nov 11, 2015 at 12:10 AM, Reynold Xin <[email protected]> wrote:
 
    
to the Spark community. A major release should not be very different from a
 
  
minor release and should not be gated based on new features. The main
 
  
purpose of a major release is an opportunity to fix things that are broken
 
  
in the current API and remove certain deprecated APIs (examples follow).
 
 
   
 
 
  
Agree with this stance. Generally, a major release might also be a
 
  
time to replace some big old API or implementation with a new one, but
 
  
I don't see obvious candidates.
 
  
 
 
  
I wouldn't mind turning attention to 2.x sooner than later, unless
 
  
there's a fairly good reason to continue adding features in 1.x to a
 
  
1.7 release. The scope as of 1.6 is already pretty darned big.
 
  
 
 
  
 
 
    
1. Scala 2.11 as the default build. We should still support Scala 2.10, but
 
  
it has been end-of-life.
 
 
   
 
 
  
By the time 2.x rolls around, 2.12 will be the main version, 2.11 will
 
  
be quite stable, and 2.10 will have been EOL for a while. I'd propose
 
  
dropping 2.10. Otherwise it's supported for 2 more years.
 
  
 
 
  
 
 
   
2. Remove Hadoop 1 support.
 
   
 
 
  
I'd go further to drop support for <2.2 for sure (2.0 and 2.1 were
 
  
sort of 'alpha' and 'beta' releases) and even <2.6.
 
  
 
 
  
I'm sure we'll think of a number of other small things -- shading a
 
  
bunch of stuff? reviewing and updating dependencies in light of
 
  
simpler, more recent dependencies to support from Hadoop etc?
 
  
 
 
  
Farming out Tachyon to a module? (I felt like someone proposed this?)
 
  
Pop out any Docker stuff to another repo?
 
  
Continue that same effort for EC2?
 
  
Farming out some of the "external" integrations to another repo (?
 
  
controversial)
 
  
 
 
  
See also anything marked version "2+" in JIRA.
 
  
 
 
  
---------------------------------------------------------------------
 
  
To unsubscribe, e-mail:  [email protected]
 
  
For additional commands, e-mail:  [email protected]
 
 
   
 
 
  
 
 
  
---------------------------------------------------------------------
 
  
To unsubscribe, e-mail:  [email protected]
 
  
For additional commands, e-mail:  [email protected]
 
 
   
 
 
  
 
 
  
 
 
  
 
 
  
---------------------------------------------------------------------
 
  
To unsubscribe, e-mail:  [email protected]
 
  
For additional commands, e-mail:  [email protected]

Re: RE: A proposal for Spark 2.0

Reply via email to