Re: Kill Spark Streaming JOB from Spark UI or Yarn

2017-08-27 Thread Matei Zaharia
The batches should all have the same application ID, so use that one. You can also find the application in the YARN UI to terminate it from there. Matei > On Aug 27, 2017, at 10:27 AM, KhajaAsmath Mohammed > wrote: > > Hi, > > I am new to spark streaming and not

Re: SPIP: Spark on Kubernetes

2017-08-17 Thread Matei Zaharia
+1 from me as well. Matei > On Aug 17, 2017, at 10:55 AM, Reynold Xin wrote: > > +1 on adding Kubernetes support in Spark (as a separate module similar to how > YARN is done) > > I talk with a lot of developers and teams that operate cloud services, and > k8s in the

Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-07 Thread Matei Zaharia
Hi everyone, The Spark PMC recently voted to add Hyukjin Kwon and Sameer Agarwal as committers. Join me in congratulating both of them and thanking them for their contributions to the project! Matei - To unsubscribe e-mail:

Re: real world spark code

2017-07-25 Thread Matei Zaharia
You can also find a lot of GitHub repos for external packages here: http://spark.apache.org/third-party-projects.html Matei > On Jul 25, 2017, at 5:30 PM, Frank Austin Nothaft > wrote: > > There’s a number of real-world open source Spark applications in the sciences: >

Fwd: Testing Apache Spark with JDK 9 Early Access builds

2017-07-14 Thread Matei Zaharia
FYI, the JDK group at Oracle is reaching out to see whether anyone wants to test with JDK 9 and give them feedback. Just contact them directly if you'd like to. -- Forwarded message -- From: dalibor topic Date: Wed, Jul 12, 2017 at 3:16 AM Subject:

Re: Are release docs part of a release?

2017-06-08 Thread Matei Zaharia
I agree that it seems completely fine to update the web version of the docs after a release. What would not be fine is updating the downloadable package for it without another vote (and another release number). When people voted on a release, they voted that we should put up that package as

Re: Uploading PySpark 2.1.1 to PyPi

2017-05-29 Thread Matei Zaharia
Didn't we want to upload 2.1.1 too? What is the local version string problem? Matei > On May 26, 2017, at 10:11 AM, Xiao Li wrote: > > Hi, Holden, > > That sounds good to me! > > Thanks, > > Xiao > > 2017-05-23 16:32 GMT-07:00 Holden Karau

Re: Why did spark switch from AKKA to net / ...

2017-05-07 Thread Matei Zaharia
More specifically, many user applications that link to Spark also linked to Akka as a library (e.g. say you want to write a service that receives requests from Akka and runs them on Spark). In that case, you'd have two conflicting versions of the Akka library in the same JVM. Matei > On May

Re: SPIP docs are live

2017-03-16 Thread Matei Zaharia
Yup, thanks everyone and Cody in particular for putting this together. I think it will help a lot. Matei > On Mar 16, 2017, at 1:57 PM, Joseph Bradley wrote: > > Awesome! Thanks for pushing this through, Cody. > Joseph > > On Sun, Mar 12, 2017 at 1:18 AM, Sean Owen

Re: Handling questions in the mailing lists

2016-11-06 Thread Matei Zaharia
Even for the mailing list, I'd love to have a short set of instructions on how to submit your questions (maybe on http://spark.apache.org/community.html or maybe in the welcome email when you subscribe). It would be great if someone added that. After all, we have such instructions for

Re: Structured Streaming with Kafka Source, does it work??

2016-11-06 Thread Matei Zaharia
The Kafka source will only appear in 2.0.2 -- see this thread for the current release candidate: https://lists.apache.org/thread.html/597d630135e9eb3ede54bb0cc0b61a2b57b189588f269a64b58c9243@%3Cdev.spark.apache.org%3E . You can try that right now if you want from the staging Maven repo shown

Re: Structured Streaming with Kafka Source, does it work??

2016-11-06 Thread Matei Zaharia
The Kafka source will only appear in 2.0.2 -- see this thread for the current release candidate: https://lists.apache.org/thread.html/597d630135e9eb3ede54bb0cc0b61a2b57b189588f269a64b58c9243@%3Cdev.spark.apache.org%3E . You can try that right now if you want from the staging Maven repo shown

Re: Anyone seeing a lot of Spark emails go to Gmail spam?

2016-11-02 Thread Matei Zaharia
It might be useful to ask Apache Infra whether they have any information on these (e.g. what do their own spam metrics say, do they get any feedback from Google, etc). Unfortunately mailing lists seem to be less and less well supported by most email providers. Matei > On Nov 2, 2016, at 6:48

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Matei Zaharia
Just to comment on this, I'm generally against removing these types of things unless they create a substantial burden on project contributors. It doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might, but then of course we need to wait for 2.12 to be out and stable. In

Re: StructuredStreaming status

2016-10-19 Thread Matei Zaharia
gt; > -Abhishek- > >> On Oct 19, 2016, at 5:36 PM, Matei Zaharia <matei.zaha...@gmail.com >> <mailto:matei.zaha...@gmail.com>> wrote: >> >> I'm also curious whether there are concerns other than latency with the way >> stuff executes in Structure

Re: StructuredStreaming status

2016-10-19 Thread Matei Zaharia
een that > clear direction, and this is by no means a recent issue. > > > On Oct 19, 2016 7:36 PM, "Matei Zaharia" <matei.zaha...@gmail.com > <mailto:matei.zaha...@gmail.com>> wrote: > I'm also curious whether there are concerns other than latency with the way >

Re: StructuredStreaming status

2016-10-19 Thread Matei Zaharia
I'm also curious whether there are concerns other than latency with the way stuff executes in Structured Streaming (now that the time steps don't have to act as triggers), as well as what latency people want for various apps. The stateful operator designs for streaming systems aren't inherently

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Matei Zaharia
Is there any way to tie wiki accounts with JIRA accounts? I found it weird that they're not tied at the ASF. Otherwise, moving this into the docs might make sense. Matei > On Oct 18, 2016, at 6:19 AM, Cody Koeninger wrote: > > +1 to putting docs in one clear place. > >

Re: Spark Improvement Proposals

2016-10-09 Thread Matei Zaharia
trategies, given that commiters are the only ones I'm saying should > formally submit SPARKLIs or SIPs, if they put junk in a required section then > slap them down for it and tell them to fix it. > > > On Oct 9, 2016 4:36 PM, "Matei Zaharia" <matei.zaha...@gmail.com &

Re: Spark Improvement Proposals

2016-10-09 Thread Matei Zaharia
rategy section "This is not a full > design document." Is this unclear? Design docs can be worked on > obviously, but that's not what I'm concerned with here. > > > > > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: >>

Re: Spark Improvement Proposals

2016-10-09 Thread Matei Zaharia
nym, SIP, conflicts with Scala SIPs > <http://docs.scala-lang.org/sips/index.html>. Since the Scala and Spark > communities have a lot of overlap, we don’t want, for example, names like > “SIP-10” to have an ambiguous meaning. > > Nick > > > On Sun, Oct 9, 2016 at

Re: Spark Improvement Proposals

2016-10-08 Thread Matei Zaharia
ng a SIP label on major JIRAs and then link to them >> prominently on the Spark website makes a lot of sense. >> >> >> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia <matei.zaha...@gmail.com> >> wrote: >>> >>> For the improvement proposals, I t

Re: Improving governance / committers (split from Spark Improvement Proposals thread)

2016-10-08 Thread Matei Zaharia
This makes a lot of sense; just to comment on a few things: > - More committers > Just looking at the ratio of committers to open tickets, or committers > to contributors, I don't think you have enough human power. > I realize this is a touchy issue. I don't have dog in this fight, > because I'm

Re: Improving volunteer management / JIRAs (split from Spark Improvement Proposals thread)

2016-10-08 Thread Matei Zaharia
I like this idea of asking them. BTW, one other thing we can do *provided the JIRAs are eventually under control* is to create a filter for old JIRAs that have not received a response in X amount of time and have the system automatically email the dev list with this report every month. Then

Re: Spark Improvement Proposals

2016-10-07 Thread Matei Zaharia
are working on a specific issue, > whether they will be working on a specific issue, and whether an issue or pr > or jira should be rejected. Most people I know in this community are nice and > don't enjoy telling other people no, but it is often more annoying to a > contributor to n

Re: Spark Improvement Proposals

2016-10-07 Thread Matei Zaharia
if I opened a JIRA on a project and nobody looked at it and that happened to me, I'd actively feel ignored. If you do that, you'll see people on stage saying "I reported a bug for Spark and some bot just closed it after 3 months", which is not ideal. Matei > > > O

Re: Spark Improvement Proposals

2016-10-06 Thread Matei Zaharia
Hey Cody, Thanks for bringing these things up. You're talking about quite a few different things here, but let me get to them each in turn. 1) About technical / design discussion -- I fully agree that everything big should go through a lot of review, and I like the idea of a more formal way to

Re: [VOTE] Release Apache Spark 2.0.1 (RC4)

2016-09-29 Thread Matei Zaharia
+1 Matei > On Sep 29, 2016, at 10:59 AM, Herman van Hövell tot Westerflier > wrote: > > +1 (non binding) > > On Thu, Sep 29, 2016 at 10:59 AM, Weiqing Yang > wrote: > +1 (non binding) > > RC4 is

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-25 Thread Matei Zaharia
+1 Matei > On Sep 25, 2016, at 1:25 PM, Josh Rosen wrote: > > +1 > > On Sun, Sep 25, 2016 at 1:16 PM Yin Huai > wrote: > +1 > > On Sun, Sep 25, 2016 at 11:40 AM, Dongjoon Hyun

[jira] [Commented] (SPARK-17445) Reference an ASF page as the main place to find third-party packages

2016-09-12 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484732#comment-15484732 ] Matei Zaharia commented on SPARK-17445: --- Sounds good to me. > Reference an ASF page as the m

[jira] [Commented] (SPARK-17445) Reference an ASF page as the main place to find third-party packages

2016-09-10 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15480419#comment-15480419 ] Matei Zaharia commented on SPARK-17445: --- Sounds good, but IMO just keep the current supplemental

[jira] [Commented] (SPARK-17445) Reference an ASF page as the main place to find third-party packages

2016-09-09 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15479121#comment-15479121 ] Matei Zaharia commented on SPARK-17445: --- The powered by wiki page is a bit of a mess IMO, so I'd

Re: FileStreamSource source checks path eagerly?

2016-09-08 Thread Matei Zaharia
This source is meant to be used for a shared file system such as HDFS or NFS, where both the driver and the workers can see the same folders. There's no support in Spark for just working with local files on different workers. Matei > On Sep 8, 2016, at 2:23 AM, Jacek Laskowski

[jira] [Commented] (SPARK-17445) Reference an ASF page as the main place to find third-party packages

2016-09-08 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15474543#comment-15474543 ] Matei Zaharia commented on SPARK-17445: --- I think one part you're missing, Josh, is that spark

[jira] [Created] (SPARK-17445) Reference an ASF page as the main place to find third-party packages

2016-09-07 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-17445: - Summary: Reference an ASF page as the main place to find third-party packages Key: SPARK-17445 URL: https://issues.apache.org/jira/browse/SPARK-17445 Project

Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Matei Zaharia
The question is just whether the metadata and instructions involving these Maven packages counts as sufficient to tell the user that they have different licensing terms. For example, our Ganglia package was called spark-ganglia-lgpl (so you'd notice it's a different license even from its name),

Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Matei Zaharia
I think you should ask legal about how to have some Maven artifacts for these. Both Ganglia and Kinesis are very widely used, so it's weird to ask users to build them from source. Maybe the Maven artifacts can be marked as being under a different license? In the initial discussion for

Re: Is "spark streaming" streaming or mini-batch?

2016-08-23 Thread Matei Zaharia
I think people explained this pretty well, but in practice, this distinction is also somewhat of a marketing term, because every system will perform some kind of batching. For example, every time you use TCP, the OS and network stack may buffer multiple messages together and send them at once;

Re: unsubscribe

2016-08-10 Thread Matei Zaharia
To unsubscribe, please send an email to user-unsubscr...@spark.apache.org from the address you're subscribed from. Matei > On Aug 10, 2016, at 12:48 PM, Sohil Jain wrote: > > - To unsubscribe

Welcoming Felix Cheung as a committer

2016-08-08 Thread Matei Zaharia
Hi all, The PMC recently voted to add Felix Cheung as a committer. Felix has been a major contributor to SparkR and we're excited to have him join officially. Congrats and welcome, Felix! Matei - To unsubscribe e-mail:

Re: Dropping late date in Structured Streaming

2016-08-06 Thread Matei Zaharia
Yes, a built-in mechanism is planned in future releases. You can also drop it using a filter for now but the stateful operators will still keep state for old windows. Matei > On Aug 6, 2016, at 9:40 AM, Amit Sela wrote: > > I've noticed that when using Structured

Re: renaming "minor release" to "feature release"

2016-07-28 Thread Matei Zaharia
I also agree with this given the way we develop stuff. We don't really want to move to possibly-API-breaking major releases super often, but we do have lots of large features that come out all the time, and our current name doesn't convey that. Matei > On Jul 28, 2016, at 4:15 PM, Reynold Xin

Re: The Future Of DStream

2016-07-27 Thread Matei Zaharia
Yup, they will definitely coexist. Structured Streaming is currently alpha and will probably be complete in the next few releases, but Spark Streaming will continue to exist, because it gives the user more low-level control. It's similar to DataFrames vs RDDs (RDDs are the lower-level API for

Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Matei Zaharia
+1 Tested on Mac. Matei > On Jul 22, 2016, at 11:18 AM, Joseph Bradley wrote: > > +1 > > Mainly tested ML/Graph/R. Perf tests from Tim Hunter showed minor speedups > from 1.6 for common ML algorithms. > > On Thu, Jul 21, 2016 at 9:41 AM, Ricardo Almeida >

Re: How to explain SchedulerBackend.reviveOffers()?

2016-06-20 Thread Matei Zaharia
Hi Jacek, This applies to all schedulers actually -- it just tells Spark to re-check the available nodes and possibly launch tasks on them, because a new stage was submitted. Then when any node is available, the scheduler will call the TaskSetManager with an "offer" for the node. Matei > On

[jira] [Commented] (SPARK-16031) Add debug-only socket source in Structured Streaming

2016-06-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337182#comment-15337182 ] Matei Zaharia commented on SPARK-16031: --- FYI I'll post a PR for this soon. > Add debug-only soc

[jira] [Created] (SPARK-16031) Add debug-only socket source in Structured Streaming

2016-06-17 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-16031: - Summary: Add debug-only socket source in Structured Streaming Key: SPARK-16031 URL: https://issues.apache.org/jira/browse/SPARK-16031 Project: Spark Issue

Updated Spark logo

2016-06-10 Thread Matei Zaharia
Hi all, FYI, we've recently updated the Spark logo at https://spark.apache.org/ to say "Apache Spark" instead of just "Spark". Many ASF projects have been doing this recently to make it clearer that they are associated with the ASF, and indeed the ASF's branding guidelines generally require

Updated Spark logo

2016-06-10 Thread Matei Zaharia
Hi all, FYI, we've recently updated the Spark logo at https://spark.apache.org/ to say "Apache Spark" instead of just "Spark". Many ASF projects have been doing this recently to make it clearer that they are associated with the ASF, and indeed the ASF's branding guidelines generally require

[jira] [Created] (SPARK-15879) Update logo in UI and docs to add "Apache"

2016-06-10 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-15879: - Summary: Update logo in UI and docs to add "Apache" Key: SPARK-15879 URL: https://issues.apache.org/jira/browse/SPARK-15879 Project: Spark

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-06 Thread Matei Zaharia
Is there any way to remove artifacts from Maven Central? Maybe that would help clean these things up long-term, though it would create problems for users who for some reason decide to rely on these previews. In any case, if people are *really* concerned about this, we should just put it there. My

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-04 Thread Matei Zaharia
Personally I'd just put them on the staging repo and link to that on the downloads page. It will create less confusion for people browsing Maven Central later and wondering which releases are safe to use. Matei > On Jun 3, 2016, at 8:22 AM, Mark Hamstra wrote: > >

Welcoming Yanbo Liang as a committer

2016-06-03 Thread Matei Zaharia
Hi all, The PMC recently voted to add Yanbo Liang as a committer. Yanbo has been a super active contributor in many areas of MLlib. Please join me in welcoming Yanbo! Matei - To unsubscribe, e-mail:

[RESULT][VOTE] Removing module maintainer process

2016-05-26 Thread Matei Zaharia
Thanks everyone for voting. With only +1 votes, the vote passes, so I'll update the contributor wiki appropriately. +1 votes: Matei Zaharia (binding) Mridul Muralidharan (binding) Andrew Or (binding) Sean Owen (binding) Nick Pentreath (binding) Tom Graves (binding) Imran Rashid (binding) Holden

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Matei Zaharia
Just wondering, what is the main use case for the Docker images -- to develop apps locally or to deploy a cluster? If the image is really just a script to download a certain package name from a mirror, it may be okay to create an official one, though it does seem tricky to make it properly use

Re: [VOTE] Removing module maintainer process

2016-05-22 Thread Matei Zaharia
Correction, let's run this for 72 hours, so until 9 PM EST May 25th. > On May 22, 2016, at 8:34 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: > > It looks like the discussion thread on this has only had positive replies, so > I'm going to call a VOTE. The pro

[VOTE] Removing module maintainer process

2016-05-22 Thread Matei Zaharia
It looks like the discussion thread on this has only had positive replies, so I'm going to call a VOTE. The proposal is to remove the maintainer process in https://cwiki.apache.org/confluence/display/SPARK/Committers#Committers-ReviewProcessandMaintainers

[DISCUSS] Removing or changing maintainer process

2016-05-19 Thread Matei Zaharia
Hi folks, Around 1.5 years ago, Spark added a maintainer process for reviewing API and architectural changes (https://cwiki.apache.org/confluence/display/SPARK/Committers#Committers-ReviewProcessandMaintainers) to make sure these are seen by people who spent a lot of time on that component.

Re: Apache Spark Slack

2016-05-16 Thread Matei Zaharia
I don't think any of the developers use this as an official channel, but all the ASF IRC channels are indeed on FreeNode. If there's demand for it, we can document this on the website and say that it's mostly for users to find other users. Development discussions should happen on the dev

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Matei Zaharia
This sounds good to me as well. The one thing we should pay attention to is how we update the docs so that people know to start with the spark.ml classes. Right now the docs list spark.mllib first and also seem more comprehensive in that area than in spark.ml, so maybe people naturally move

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Matei Zaharia
This sounds good to me as well. The one thing we should pay attention to is how we update the docs so that people know to start with the spark.ml classes. Right now the docs list spark.mllib first and also seem more comprehensive in that area than in spark.ml, so maybe people naturally move

[jira] [Assigned] (SPARK-14356) Update spark.sql.execution.debug to work on Datasets

2016-04-03 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-14356: - Assignee: Matei Zaharia > Update spark.sql.execution.debug to work on Datas

[jira] [Created] (SPARK-14356) Update spark.sql.execution.debug to work on Datasets

2016-04-03 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-14356: - Summary: Update spark.sql.execution.debug to work on Datasets Key: SPARK-14356 URL: https://issues.apache.org/jira/browse/SPARK-14356 Project: Spark Issue

Re: Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle

2016-03-30 Thread Matei Zaharia
I agree that putting it in 2.0 doesn't mean keeping Scala 2.10 for the entire 2.x line. My vote is to keep Scala 2.10 in Spark 2.0, because it's the default version we built with in 1.x. We want to make the transition from 1.x to 2.0 as easy as possible. In 2.0, we'll have the default downloads

Welcoming two new committers

2016-02-08 Thread Matei Zaharia
Hi all, The PMC has recently added two new Spark committers -- Herman van Hovell and Wenchen Fan. Both have been heavily involved in Spark SQL and Tungsten, adding new features, optimizations and APIs. Please join me in welcoming Herman and Wenchen. Matei

Re: simultaneous actions

2016-01-17 Thread Matei Zaharia
able to dispatch jobs from both actions simultaneously (or on a > when-workers-become-available basis)? > > On 15 January 2016 at 11:44, Koert Kuipers <ko...@tresata.com > <mailto:ko...@tresata.com>> wrote: > we run multiple actions on the same (cached) rdd a

Re: simultaneous actions

2016-01-15 Thread Matei Zaharia
RDDs actually are thread-safe, and quite a few applications use them this way, e.g. the JDBC server. Matei > On Jan 15, 2016, at 2:10 PM, Jakob Odersky wrote: > > I don't think RDDs are threadsafe. > More fundamentally however, why would you want to run RDD actions in >

Re: Compiling only MLlib?

2016-01-15 Thread Matei Zaharia
Have you tried just downloading a pre-built package, or linking to Spark through Maven? You don't need to build it unless you are changing code inside it. Check out http://spark.apache.org/docs/latest/quick-start.html#self-contained-applications for how to link to it. Matei > On Jan 15,

Re: Read from AWS s3 with out having to hard-code sensitive keys

2016-01-11 Thread Matei Zaharia
In production, I'd recommend using IAM roles to avoid having keys altogether. Take a look at http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html. Matei > On Jan 11, 2016, at 11:32 AM, Sabarish Sasidharan > wrote: > > If you are

[jira] [Commented] (SPARK-10854) MesosExecutorBackend: Received launchTask but executor was null

2015-12-03 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038058#comment-15038058 ] Matei Zaharia commented on SPARK-10854: --- Just a note, I saw a log where this happened

Re: A proposal for Spark 2.0

2015-11-24 Thread Matei Zaharia
ackages seems to be somewhat confusing. > > With regards to GraphX, it would be great to deprecate the use of RDD in > GraphX and switch to Dataframe. This will allow GraphX evolve with Tungsten. > > > > Best regards, Alexander > > > > From: Nan Zhu [mailto:z

[jira] [Created] (SPARK-11733) Allow shuffle readers to request data from just one mapper

2015-11-13 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-11733: - Summary: Allow shuffle readers to request data from just one mapper Key: SPARK-11733 URL: https://issues.apache.org/jira/browse/SPARK-11733 Project: Spark

Re: [DISCUSS] Spark-Kernel Incubator Proposal

2015-11-13 Thread Matei Zaharia
One question about this from the Spark side: have you considered giving the project a different name so that it doesn't sound like a Spark component? Right now "Spark Kernel" may be confused with "Spark Core" and things like that. I don't see a lot of Apache TLPs with related names, though

Re: A proposal for Spark 2.0

2015-11-11 Thread Matei Zaharia
I like the idea of popping out Tachyon to an optional component too to reduce the number of dependencies. In the future, it might even be useful to do this for Hadoop, but it requires too many API changes to be worth doing now. Regarding Scala 2.12, we should definitely support it eventually,

[jira] [Commented] (SPARK-9999) RDD-like API on top of Catalyst/DataFrame

2015-10-16 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961567#comment-14961567 ] Matei Zaharia commented on SPARK-: -- Beyond tuples, you'll also want encoders for other generic

Re: How to compile Spark with customized Hadoop?

2015-10-09 Thread Matei Zaharia
You can publish your version of Hadoop to your Maven cache with mvn publish (just give it a different version number, e.g. 2.7.0a) and then pass that as the Hadoop version to Spark's build (see http://spark.apache.org/docs/latest/building-spark.html

[jira] [Commented] (SPARK-9850) Adaptive execution in Spark

2015-09-24 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907518#comment-14907518 ] Matei Zaharia commented on SPARK-9850: -- Hey Imran, this could make sense, but note that the problem

[jira] [Resolved] (SPARK-9852) Let reduce tasks fetch multiple map output partitions

2015-09-24 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-9852. -- Resolution: Fixed Fix Version/s: 1.6.0 > Let reduce tasks fetch multiple map out

[jira] [Updated] (SPARK-9852) Let reduce tasks fetch multiple map output partitions

2015-09-20 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-9852: - Summary: Let reduce tasks fetch multiple map output partitions (was: Let HashShuffleFetcher

[jira] [Resolved] (SPARK-9851) Support submitting map stages individually in DAGScheduler

2015-09-14 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-9851. -- Resolution: Fixed Fix Version/s: 1.6.0 > Support submitting map stages individua

Re: Ranger-like Security on Spark

2015-09-03 Thread Matei Zaharia
If you run on YARN, you can use Kerberos, be authenticated as the right user, etc in the same way as MapReduce jobs. Matei > On Sep 3, 2015, at 1:37 PM, Daniel Schulz > wrote: > > Hi, > > I really enjoy using Spark. An obstacle to sell it to our clients

Re: Ranger-like Security on Spark

2015-09-03 Thread Matei Zaharia
entitled to read/write? Will > it enforce HDFS ACLs and Ranger policies as well? > > Best regards, Daniel. > > > On 03 Sep 2015, at 21:16, Matei Zaharia <matei.zaha...@gmail.com > > <mailto:matei.zaha...@gmail.com>> wrote: > > > > If you ru

[jira] [Assigned] (SPARK-9853) Optimize shuffle fetch of contiguous partition IDs

2015-08-20 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-9853: Assignee: Matei Zaharia Optimize shuffle fetch of contiguous partition IDs

[jira] [Resolved] (SPARK-10008) Shuffle locality can take precedence over narrow dependencies for RDDs with both

2015-08-16 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-10008. --- Resolution: Fixed Fix Version/s: 1.5.0 Shuffle locality can take precedence over

[jira] [Assigned] (SPARK-10008) Shuffle locality can take precedence over narrow dependencies for RDDs with both

2015-08-14 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-10008: - Assignee: Matei Zaharia Shuffle locality can take precedence over narrow dependencies

[jira] [Created] (SPARK-10008) Shuffle locality can take precedence over narrow dependencies for RDDs with both

2015-08-14 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-10008: - Summary: Shuffle locality can take precedence over narrow dependencies for RDDs with both Key: SPARK-10008 URL: https://issues.apache.org/jira/browse/SPARK-10008

[jira] [Updated] (SPARK-9851) Support submitting map stages individually in DAGScheduler

2015-08-13 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-9851: - Summary: Support submitting map stages individually in DAGScheduler (was: Add support

[jira] [Updated] (SPARK-9923) ShuffleMapStage.numAvailableOutputs should be an Int instead of Long

2015-08-12 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-9923: - Labels: Starter (was: ) ShuffleMapStage.numAvailableOutputs should be an Int instead of Long

[jira] [Created] (SPARK-9923) ShuffleMapStage.numAvailableOutputs should be an Int instead of Long

2015-08-12 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-9923: Summary: ShuffleMapStage.numAvailableOutputs should be an Int instead of Long Key: SPARK-9923 URL: https://issues.apache.org/jira/browse/SPARK-9923 Project: Spark

[jira] [Updated] (SPARK-9850) Adaptive execution in Spark

2015-08-12 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-9850: - Issue Type: Epic (was: New Feature) Adaptive execution in Spark

[jira] [Updated] (SPARK-9850) Adaptive execution in Spark

2015-08-11 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-9850: - Assignee: Yin Huai Adaptive execution in Spark

[jira] [Assigned] (SPARK-9851) Add support for submitting map stages individually in DAGScheduler

2015-08-11 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-9851: Assignee: Matei Zaharia Add support for submitting map stages individually

[jira] [Created] (SPARK-9852) Let HashShuffleFetcher fetch multiple map output partitions

2015-08-11 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-9852: Summary: Let HashShuffleFetcher fetch multiple map output partitions Key: SPARK-9852 URL: https://issues.apache.org/jira/browse/SPARK-9852 Project: Spark

[jira] [Assigned] (SPARK-9852) Let HashShuffleFetcher fetch multiple map output partitions

2015-08-11 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-9852: Assignee: Matei Zaharia Let HashShuffleFetcher fetch multiple map output partitions

[jira] [Created] (SPARK-9851) Add support for submitting map stages individually in DAGScheduler

2015-08-11 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-9851: Summary: Add support for submitting map stages individually in DAGScheduler Key: SPARK-9851 URL: https://issues.apache.org/jira/browse/SPARK-9851 Project: Spark

[jira] [Updated] (SPARK-9850) Adaptive execution in Spark

2015-08-11 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-9850: - Attachment: AdaptiveExecutionInSpark.pdf Adaptive execution in Spark

[jira] [Created] (SPARK-9850) Adaptive execution in Spark

2015-08-11 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-9850: Summary: Adaptive execution in Spark Key: SPARK-9850 URL: https://issues.apache.org/jira/browse/SPARK-9850 Project: Spark Issue Type: New Feature

[jira] [Created] (SPARK-9853) Optimize shuffle fetch of contiguous partition IDs

2015-08-11 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-9853: Summary: Optimize shuffle fetch of contiguous partition IDs Key: SPARK-9853 URL: https://issues.apache.org/jira/browse/SPARK-9853 Project: Spark Issue Type

[jira] [Resolved] (SPARK-9244) Increase some default memory limits

2015-07-22 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-9244. -- Resolution: Fixed Fix Version/s: 1.5.0 Increase some default memory limits

[jira] [Created] (SPARK-9244) Increase some default memory limits

2015-07-21 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-9244: Summary: Increase some default memory limits Key: SPARK-9244 URL: https://issues.apache.org/jira/browse/SPARK-9244 Project: Spark Issue Type: Improvement

<    1   2   3   4   5   6   7   8   9   10   >