How about the Hive dependency? We use ThriftServer, serdes and even the
parser/execute logic in Hive. Where will we direct about this part?
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/A-proposal-for-Spark-2-0-tp15122p15793.html
Sent from the
I think this will be hard to maintain; we already have JIRA as the de
facto central place to store discussions and prioritize work, and the
2.x stuff is already a JIRA. The wiki doesn't really hurt, just
probably will never be looked at again. Let's point people in all
cases to JIRA.
On Tue, Dec
Yeah, I'd also favor maintaining docs with strictly temporary relevance on
JIRA when possible. The wiki is like this weird backwater I only rarely
visit.
Don't we typically do this kind of stuff with an umbrella issue on JIRA?
Tom, wouldn't that work well for you?
Nick
On Wed, Dec 23, 2015 at
I started a wiki page:
https://cwiki.apache.org/confluence/display/SPARK/Development+Discussions
On Tue, Dec 22, 2015 at 6:27 AM, Tom Graves wrote:
> Do we have a summary of all the discussions and what is planned for 2.0
> then? Perhaps we should put on the wiki for
Do we have a summary of all the discussions and what is planned for 2.0 then?
Perhaps we should put on the wiki for reference.
Tom
On Tuesday, December 22, 2015 12:12 AM, Reynold Xin
wrote:
FYI I updated the master branch's Spark version to 2.0.0-SNAPSHOT.
On
FYI I updated the master branch's Spark version to 2.0.0-SNAPSHOT.
On Tue, Nov 10, 2015 at 3:10 PM, Reynold Xin wrote:
> I’m starting a new thread since the other one got intermixed with feature
> requests. Please refrain from making feature request in this thread. Not
>
I'm not sure if we need special API support for GPUs. You can already use
GPUs on individual executor nodes to build your own applications. If we
want to leverage GPUs out of the box, I don't think the solution is to
provide GPU specific APIs. Rather, we should just switch the underlying
execution
Thanks your quick respose, ok, I will start a new thread with my thoughts
Thanks,
Allen
At 2015-12-22 15:19:49, "Reynold Xin" wrote:
I'm not sure if we need special API support for GPUs. You can already use GPUs
on individual executor nodes to build your own
plus dev
在 2015-12-22 15:15:59,"Allen Zhang" 写道:
Hi Reynold,
Any new API support for GPU computing in our 2.0 new version ?
-Allen
在 2015-12-22 14:12:50,"Reynold Xin" 写道:
FYI I updated the master branch's Spark version to
Hi Kostas
With regards to your *second* point. I believe that requiring from the user
apps to explicitly declare their dependencies is the most clear API
approach when it comes to classpath and classloading.
However what about the following API: *SparkContext.addJar(String
pathToJar)* . *Is this
I'd also like to make it a requirement that Spark 2.0 have a stable
dataframe and dataset API - we should not leave these APIs experimental in
the 2.0 release. We already know of at least one breaking change we need to
make to dataframes, now's the time to make any other changes we need to
To be clear-er, I don't think it's clear yet whether a 1.7 release
should exist or not. I could see both making sense. It's also not
really necessary to decide now, well before a 1.6 is even out in the
field. Deleting the version lost information, and I would not have
done that given my reply.
gt;>>>>>> <m...@clearstorydata.com> wrote:
> >>>>>>>>
> >>>>>>>> Why does stabilization of those two features require a 1.7 release
> >>>>>>>> instead of 1.6.1?
> >>>>>>>>
&g
off the topic of Spark 2.0 a little bit here - yes we
>>>>>>>>> can talk about RDD vs. DS/DF more but lets refocus on Spark 2.0. I'd
>>>>>>>>> like to
>>>>>>>>> propose we have one more 1.x release after Spark 1.6. This will allow
Pardon for tacking on one more message to this thread, but I'm
reminded of one more issue when building the RC today: Scala 2.10 does
not in general try to work with Java 8, and indeed I can never fully
compile it with Java 8 on Ubuntu or OS X, due to scalac assertion
errors. 2.11 is the first
se the new
>> Dataset
>> >>>>>>> APIs but can't move to Spark 2.0 because of the backwards
>> incompatible
>> >>>>>>> changes, like removal of deprecated APIs, Scala 2.11 etc.
>&g
>>>>> changing the
> > > >>>>>>> APIs before we declare it stable. This is why I think it is
> > > >>>>>>> important to
> > > >>>>>>> first stabilize the Dataset API with a Spark 1.7 release before
> >
Spark 1.6. This will allow
>>>>>>>> us to
>>>>>>>> stabilize a few of the new features that were added in 1.6:
>>>>>>>>
>>>>>>>> 1) the experimental Datasets API
>>>>>>>> 2) the new unified memory manager.
&g
> On 25 Nov 2015, at 08:54, Sandy Ryza wrote:
>
> I see. My concern is / was that cluster operators will be reluctant to
> upgrade to 2.0, meaning that developers using those clusters need to stay on
> 1.x, and, if they want to move to DataFrames, essentially need to
hat were added in 1.6:
>>>>>>>>
>>>>>>>> 1) the experimental Datasets API
>>>>>>>> 2) the new unified memory manager.
>>>>>>>>
>>>>>>>> I understand our goal for Spark 2.0 is to offer
erstand our goal for Spark 2.0 is to offer an easy transition
>>>>>>> but there will be users that won't be able to seamlessly upgrade given
>>>>>>> what
>>>>>>> we have discussed as in scope for 2.0. For these users, having a 1.x
>>>>>
;>>> what
>>>>>>> we have discussed as in scope for 2.0. For these users, having a 1.x
>>>>>>> release with these new features/APIs stabilized will be very beneficial.
>>>>>>> This might make Spark 1.7 a lighter release but t
cial.
>>>>>> This might make Spark 1.7 a lighter release but that is not necessarily a
>>>>>> bad thing.
>>>>>>
>>>>>> Any thoughts on this timeline?
>>>>>>
>>>>>> Kostas Sakell
t that is not necessarily a bad thing.
>>>>>
>>>>> Any thoughts on this timeline?
>>>>>
>>>>> Kostas Sakellis
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 12, 2015 at 8:39 PM, Cheng, Hao <hao.c
like, the ShuffledRDD etc..
> But PairRDDFunctions probably not in this category, as we can do the same
> thing easily with DF/DS, even better performance.
>
> <>
> From: Mark Hamstra [mailto:m...@clearstorydata.com
> <mailto:m...@clearstorydata.com>]
> Sent:
I came well
>>>>> before
>>>>> DataFrames and DataSets, so programming guides, introductory how-to
>>>>> articles and the like have, to this point, also tended to emphasize RDDs
>>>>> --
>>>>> or at least to deal with them early. Wh
out what kind of RDD APIs we have to provide
>>>> to developer, maybe the fundamental API is enough, like, the ShuffledRDD
>>>> etc.. But PairRDDFunctions probably not in this category, as we can do the
>>>> same thing easily with DF/DS, even better performa
1.7
I am +1 on the proposal for Spark 2.0.
Thanks,
Prashant Sharma
On Thu, Nov 12, 2015 at 3:02 AM, Matei Zaharia <matei.zaha...@gmail.com>
wrote:
> I like the idea of popping out Tachyon to an optional component too to
> reduce the number of dependencies. In the future, it migh
easily with DF/DS, even better performance.
>
>
>
> *From:* Mark Hamstra [mailto:m...@clearstorydata.com]
> *Sent:* Friday, November 13, 2015 11:23 AM
> *To:* Stephen Boesch
>
> *Cc:* dev@spark.apache.org
> *Subject:* Re: A proposal for Spark 2.0
>
>
>
> H
e thing easily with DF/DS, even better performance.
>>
>>
>>
>> *From:* Mark Hamstra [mailto:m...@clearstorydata.com]
>> *Sent:* Friday, November 13, 2015 11:23 AM
>> *To:* Stephen Boesch
>>
>> *Cc:* dev@spark.apache.org
>> *Subject:* Re: A prop
s Chammas
> *Cc:* Ulanov, Alexander; Nan Zhu; wi...@qq.com; dev@spark.apache.org;
> Reynold Xin
>
> *Subject:* Re: A proposal for Spark 2.0
>
>
>
> I know we want to keep breaking changes to a minimum but I'm hoping that
> with Spark 2.0 we can also look at bet
evolve with Tungsten.
>>
>>
>>
>> Best regards, Alexander
>>
>>
>>
>> *From:* Nan Zhu [mailto:zhunanmcg...@gmail.com]
>> *Sent:* Thursday, November 12, 2015 7:28 AM
>> *To:* wi...@qq.com
>> *Cc:* dev@spark.apache.org
>> *Subje
with DataFrame or DataSet.
Hao
From: Kostas Sakellis [mailto:kos...@cloudera.com]
Sent: Friday, November 13, 2015 5:27 AM
To: Nicholas Chammas
Cc: Ulanov, Alexander; Nan Zhu; wi...@qq.com; dev@spark.apache.org; Reynold Xin
Subject: Re: A proposal for Spark 2.0
I know we want to keep breaking changes
thing
easily with DF/DS, even better performance.
From: Mark Hamstra [mailto:m...@clearstorydata.com]
Sent: Friday, November 13, 2015 11:23 AM
To: Stephen Boesch
Cc: dev@spark.apache.org
Subject: Re: A proposal for Spark 2.0
Hmmm... to me, that seems like precisely the kind of thing that argues
Who has the idea of machine learning? Spark missing some features for machine
learning, For example, the parameter server.
> 在 2015年11月12日,05:32,Matei Zaharia 写道:
>
> I like the idea of popping out Tachyon to an optional component too to reduce
> the number of
ame or DataSet.
>>
>>
>>
>> Hao
>>
>>
>>
>> *From:* Kostas Sakellis [mailto:kos...@cloudera.com]
>> *Sent:* Friday, November 13, 2015 5:27 AM
>> *To:* Nicholas Chammas
>> *Cc:* Ulanov, Alexander; Nan Zhu; wi...@qq.com; dev@spark.ap
ecate the RDD API (or
>>> internal API only?)? As lots of its functionality overlapping with
>>> DataFrame or DataSet.
>>>
>>>
>>>
>>> Hao
>>>
>>>
>>>
>>> *From:* Kostas Sakellis [mailto:kos...@cloudera.com]
>&
...@qq.com>;
Cc: "dev@spark.apache.org"<dev@spark.apache.org>; "Reynold
Xin"<r...@databricks.com>;
Subject: RE: A proposal for Spark 2.0
Parameter Server is a new feature and thus does not match the goal of 2.0 is
??to fix things that are broken in the current API a
Being specific to Parameter Server, I think the current agreement is that PS
shall exist as a third-party library instead of a component of the core code
base, isn’t?
Best,
--
Nan Zhu
http://codingcat.me
On Thursday, November 12, 2015 at 9:49 AM, wi...@qq.com wrote:
> Who has the idea
regards, Alexander
From: Nan Zhu [mailto:zhunanmcg...@gmail.com]
Sent: Thursday, November 12, 2015 7:28 AM
To: wi...@qq.com
Cc: dev@spark.apache.org
Subject: Re: A proposal for Spark 2.0
Being specific to Parameter Server, I think the current agreement is that PS
shall exist as a third-party library
AM
> *To:* wi...@qq.com
> *Cc:* dev@spark.apache.org
> *Subject:* Re: A proposal for Spark 2.0
>
>
>
> Being specific to Parameter Server, I think the current agreement is that
> PS shall exist as a third-party library instead of a component of the core
> code base, isn’
On Wed, Nov 11, 2015 at 12:10 AM, Reynold Xin wrote:
> to the Spark community. A major release should not be very different from a
> minor release and should not be gated based on new features. The main
> purpose of a major release is an opportunity to fix things that are
If Scala 2.12 will require Java 8 and we want to enable cross-compiling
Spark against Scala 2.11 and 2.12, couldn't we just make Java 8 a
requirement if you want to use Scala 2.12?
On Wed, Nov 11, 2015 at 9:29 AM, Koert Kuipers wrote:
> i would drop scala 2.10, but definitely
Hi,
Reconsidering the execution model behind Streaming would be a good
candidate here, as Spark will not be able to provide the low latency and
sophisticated windowing semantics that more and more use-cases will
require. Maybe relaxing the strict batch model would help a lot. (Mainly
this would
i would drop scala 2.10, but definitely keep java 7
cross build for scala 2.12 is great, but i dont know how that works with
java 8 requirement. dont want to make java 8 mandatory.
and probably stating the obvious, but a lot of apis got polluted due to
binary compatibility requirement. cleaning
good point about dropping <2.2 for hadoop. you dont want to deal with
protobuf 2.4 for example
On Wed, Nov 11, 2015 at 4:58 AM, Sean Owen wrote:
> On Wed, Nov 11, 2015 at 12:10 AM, Reynold Xin wrote:
> > to the Spark community. A major release should
Considering Spark 2.x will run for 2 years, would moving up to Scala 2.12 (
pencilled in for Jan 2016 ) make any sense ? - although that would then
pre-req Java 8.
--
View this message in context:
It looks like Chill is willing to upgrade their Kryo to 3.x if Spark and Hive
will. As it is now Spark, Chill, and Hive have Kryo jar but it really can't
be used because Kryo 2 can't serdes some classes. Since Spark 2.0 is a major
release, it really would be nice if we can resolve the Kryo issue.
Resending my earlier message because it wasn't accepted.
Would like to add a proposal to upgrade jars when they do not break APIs and
fixes a bug.
To be more specific, I would like to see Kryo to be upgraded from 2.21 to
3.x. Kryo 2.x has a bug (e.g.SPARK-7708) that is blocking it usage in
I like the idea of popping out Tachyon to an optional component too to reduce
the number of dependencies. In the future, it might even be useful to do this
for Hadoop, but it requires too many API changes to be worth doing now.
Regarding Scala 2.12, we should definitely support it eventually,
On Tue, Nov 10, 2015 at 3:35 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:
>
> > 3. Assembly-free distribution of Spark: don’t require building an
> enormous assembly jar in order to run Spark.
>
> Could you elaborate a bit on this? I'm not sure what an assembly-free
> distribution
> For this reason, I would *not* propose doing major releases to break
substantial API's or perform large re-architecting that prevent users from
upgrading. Spark has always had a culture of evolving architecture
incrementally and making changes - and I don't think we want to change this
model.
+1
On a related note I think making it lightweight will ensure that we
stay on the current release schedule and don't unnecessarily delay 2.0
to wait for new features / big architectural changes.
In terms of fixes to 1.x, I think our current policy of back-porting
fixes to older releases would
There's a proposal / discussion of the assembly-less distributions at
https://github.com/vanzin/spark/pull/2/files /
https://issues.apache.org/jira/browse/SPARK-11157.
On Tue, Nov 10, 2015 at 3:53 PM, Reynold Xin wrote:
>
> On Tue, Nov 10, 2015 at 3:35 PM, Nicholas Chammas
+1 on a lightweight 2.0
What is the thinking around the 1.x line after Spark 2.0 is released? If
not terminated, how will we determine what goes into each major version
line? Will 1.x only be for stability fixes?
Thanks,
Kostas
On Tue, Nov 10, 2015 at 3:41 PM, Patrick Wendell
Would be also good to fix api breakages introduced as part of 1.0
(where there is missing functionality now), overhaul & remove all
deprecated config/features/combinations, api changes that we need to
make to public api which has been deferred for minor releases.
Regards,
Mridul
On Tue, Nov 10,
Echoing Shivaram here. I don't think it makes a lot of sense to add more
features to the 1.x line. We should still do critical bug fixes though.
On Tue, Nov 10, 2015 at 4:23 PM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:
> +1
>
> On a related note I think making it lightweight
Another +1 to Reynold's proposal.
Maybe this is obvious, but I'd like to advocate against a blanket removal
of deprecated / developer APIs. Many APIs can likely be removed without
material impact (e.g. the SparkContext constructor that takes preferred
node location data), while others likely see
Really, Sandy? "Extra consideration" even for already-deprecated API? If
we're not going to remove these with a major version change, then just when
will we remove them?
On Tue, Nov 10, 2015 at 4:53 PM, Sandy Ryza wrote:
> Another +1 to Reynold's proposal.
>
> Maybe
Agree. If it is deprecated, get rid of it in 2.0
If the deprecation was a mistake, let's fix that.
Suds
Sent from my iPhone
On Nov 10, 2015, at 5:04 PM, Reynold Xin wrote:
Maybe a better idea is to un-deprecate an API if it is too important to not
be removed.
I don't
Maybe a better idea is to un-deprecate an API if it is too important to not
be removed.
I don't think we can drop Java 7 support. It's way too soon.
On Tue, Nov 10, 2015 at 4:59 PM, Mark Hamstra
wrote:
> Really, Sandy? "Extra consideration" even for
Mark,
I think we are in agreement, although I wouldn't go to the extreme and say
"a release with no new features might even be best."
Can you elaborate "anticipatory changes"? A concrete example or so would be
helpful.
On Tue, Nov 10, 2015 at 5:19 PM, Mark Hamstra
Oh and another question - should Spark 2.0 support Java 7?
On Tue, Nov 10, 2015 at 4:53 PM, Sandy Ryza wrote:
> Another +1 to Reynold's proposal.
>
> Maybe this is obvious, but I'd like to advocate against a blanket removal
> of deprecated / developer APIs. Many APIs
I'm liking the way this is shaping up, and I'd summarize it this way (let
me know if I'm misunderstanding or misrepresenting anything):
- New features are not at all the focus of Spark 2.0 -- in fact, a
release with no new features might even be best.
- Remove deprecated API that we
I also feel the same as Reynold. I agree we should minimize API breaks and
focus on fixing things around the edge that were mistakes (e.g. exposing
Guava and Akka) rather than any overhaul that could fragment the community.
Ideally a major release is a lightweight process we can do every couple of
Hi,
I fully agree that. Actually, I'm working on PR to add "client" and
"exploded" profiles in Maven build.
The client profile create a spark-client-assembly jar, largely more
lightweight that the spark-assembly. In our case, we construct jobs that
don't require all the spark server side.
On Tue, Nov 10, 2015 at 6:51 PM, Reynold Xin wrote:
> I think we are in agreement, although I wouldn't go to the extreme and say
> "a release with no new features might even be best."
>
> Can you elaborate "anticipatory changes"? A concrete example or so would be
> helpful.
Heh... ok, I was intentionally pushing those bullet points to be extreme to
find where people would start pushing back, and I'll agree that we do
probably want some new features in 2.0 -- but I think we've got good
agreement that new features aren't really the main point of doing a 2.0
release.
I
To take a stab at an example of something concrete and anticipatory I can
go back to something I mentioned previously. It's not really a good
example because I don't mean to imply that I believe that its premises are
true, but try to go with it If we were to decide that real-time,
event-based
Agree, it makes sense.
Regards
JB
On 11/11/2015 01:28 AM, Reynold Xin wrote:
Echoing Shivaram here. I don't think it makes a lot of sense to add more
features to the 1.x line. We should still do critical bug fixes though.
On Tue, Nov 10, 2015 at 4:23 PM, Shivaram Venkataraman
70 matches
Mail list logo