Re: ASF board report draft for February

2024-02-18 Thread Matei Zaharia
tps://en.everybodywiki.com/Mich_Talebzadeh > > > Disclaimer: The information provided is correct to the best of my knowledge > but of course cannot be guaranteed . It is essential to note that, as with > any advice, one verified and tested result holds more weight than a thousand

ASF board report draft for February

2024-02-17 Thread Matei Zaharia
Hi all, I missed some reminder emails about our board report this month, but here is my draft. I’ll submit it tomorrow if that’s ok. == Issues for the board: - None Project status: - We made two patch releases: Spark 3.3.4 (EOL release) on December 16, 2023, and Spark 3.4.2 on

Re: ASF board report draft for Nov 2023

2023-11-09 Thread Matei Zaharia
t; - Apache Spark 3.3.4 EOL (December 16th) > > Dongjoon. > > On 2023/11/06 05:32:11 Matei Zaharia wrote: >> It’s time to send our project’s quarterly report to the ASF board on >> Wednesday November 8th. Here’s what I wrote as a draft; let me know any >> suggested ch

ASF board report draft for Nov 2023

2023-11-05 Thread Matei Zaharia
It’s time to send our project’s quarterly report to the ASF board on Wednesday November 8th. Here’s what I wrote as a draft; let me know any suggested changes. = Issues for the board: - None Project status: - We released Apache Spark 3.5 on September 15, a feature

Re: [DISCUSS] Updating documentation hosted for EOL and maintenance releases

2023-08-31 Thread Matei Zaharia
It would be great to do this IMO, because there are often usability and formatting fixes needed to docs over time, and people naturally search for docs from their *deployed* version of the project — not the latest version, hoping that it also applies to their release. For example, right now

Re: ASF board report draft for August 2023

2023-08-09 Thread Matei Zaharia
Thank you, Matei. >> >> It looks good to me. >> >> Dongjoon >> >> On Mon, Aug 7, 2023 at 22:54 Matei Zaharia > <mailto:matei.zaha...@gmail.com>> wrote: >>> It’s time to send our quarterly report to the A

ASF board report draft for August 2023

2023-08-07 Thread Matei Zaharia
It’s time to send our quarterly report to the ASF board on August 9th. Here’s what I wrote as a draft — feel free to suggest changes. = Issues for the board: - None Project status: - We cut the branch Spark 3.5.0 on July 17th 2023. The community is working on bug

Re: [VOTE][SPIP] Python Data Source API

2023-07-10 Thread Matei Zaharia
+1 > On Jul 10, 2023, at 10:19 AM, Takuya UESHIN wrote: > > +1 > > On Sun, Jul 9, 2023 at 10:05 PM Ruifeng Zheng > wrote: >> +1 >> >> On Mon, Jul 10, 2023 at 8:20 AM Jungtaek Lim > > wrote: >>> +1 >>> >>> On Sat, Jul 8, 2023

ASF board report draft for May 2023

2023-05-09 Thread Matei Zaharia
It’s time to send our ASF board report again on May 10th. I’ve put together this draft — let me know whether to add anything else. Issues for the board: - None Project status: - We released Apache Spark 3.4 on April 13th, a feature release with over 2600 patches.

Re: Slack for Spark Community: Merging various threads

2023-04-06 Thread Matei Zaharia
To me, the most important opportunity here is to create a better support environment for users, and I think it’s super important to allow users to join immediately on their own if we want this to succeed. A lot of users these days do prefer to join a live chat interface to get support for an

Re: [DISCUSS] Show Python code examples first in Spark documentation

2023-02-25 Thread Matei Zaharia
I’m +1 on switching to Python by default given what I see at the majority of users. I like the idea of investigating a way to save the language choice in a cookie and to switch all code examples on the page to a new language when you click one of the tabs. We used to have the switching behavior

Re: ASF board report draft for Feb 2023

2023-02-08 Thread Matei Zaharia
artifacts [SPARK-41893] in line with other ASF projects.    https://cwiki.apache.org/confluence/display/COMDEV/SBOMThanks,DongjoonOn Mon, Feb 6, 2023 at 6:13 PM Matei Zaharia <matei.zaha...@gmail.com> wrote:Hi all, It’s time to send our quarterly report to the ASF board this Wednesday (F

ASF board report draft for Feb 2023

2023-02-06 Thread Matei Zaharia
Hi all, It’s time to send our quarterly report to the ASF board this Wednesday (Feb 8th). Here is a draft; let me know if you have suggestions: === Issues for the board: - None Project status: - We cut the branch Spark 3.4.0 on Jan 24th 2023. The community is working on

Re: ASF board report draft for November

2022-11-10 Thread Matei Zaharia
Sounds good. > On Nov 7, 2022, at 12:02 PM, Dongjoon Hyun wrote: > > Shall we mention Spark 3.2.3 release preparation since Chao is currently > actively working on it? > > Dongjoon. > > On Mon, Nov 7, 2022 at 11:53 AM Matei Zaharia <mailto:matei.zaha...@gma

ASF board report draft for November

2022-11-07 Thread Matei Zaharia
It’s time to send our quarterly report to the ASF board on Wednesday. Here is a draft, let me know if you have suggestions: === Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python,

Re: ASF board report draft for August

2022-08-10 Thread Matei Zaharia
d love to get feedback and contributions on that. > On Aug 10, 2022, at 11:16 AM, Matei Zaharia wrote: > > It’s time to submit our quarterly report to the ASF board on Friday. Here is > a draft, lmk if you have suggestions: > > === > > Descriptio

ASF board report draft for August

2022-08-10 Thread Matei Zaharia
It’s time to submit our quarterly report to the ASF board on Friday. Here is a draft, lmk if you have suggestions: === Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R and SQL

Re: [VOTE][SPIP] Spark Connect

2022-06-13 Thread Matei Zaharia
+1, very excited about this direction. Matei > On Jun 13, 2022, at 11:07 AM, Herman van Hovell > wrote: > > Let me kick off the voting... > > +1 > > On Mon, Jun 13, 2022 at 2:02 PM Herman van Hovell > wrote: > Hi all, > > I’d like to start a vote for SPIP:

SIGMOD System Award for Apache Spark

2022-05-12 Thread Matei Zaharia
Hi all, We recently found out that Apache Spark received the SIGMOD System Award this year, given by SIGMOD (the ACM’s data management research organization) to impactful real-world and research systems. This puts Spark in good company with some very

ASF board report draft for May 2022

2022-05-10 Thread Matei Zaharia
Hi all, It’s time to submit our quarterly ASF board report again this Wednesday. I’ve put together the draft below. Let me know if you have any suggestions: === Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers

Re: ASF board report draft for February 2022

2022-02-09 Thread Matei Zaharia
other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > > On Tue, 8 Feb 2022 at 09:06, Matei Zahar

ASF board report draft for February 2022

2022-02-08 Thread Matei Zaharia
It’s time to send our quarterly report to the ASF board again this Wednesday. I’ve written the following draft for it — let me know if you want to add or change anything. == Description: Apache Spark is a fast and general purpose engine for large-scale data

Re: ASF board report draft for November

2021-11-10 Thread Matei Zaharia
ity for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. >

ASF board report draft for November

2021-11-09 Thread Matei Zaharia
Hi all, Our ASF board report needs to be submitted again this Wednesday (November 10). I wrote a draft with the major things that happened in the past three months — let me know if I missed something. === Description: Apache Spark is a fast and general engine for large-scale data

Re: ASF board report draft for August

2021-08-10 Thread Matei Zaharia
gt; Just one question, when you mention “ We are working on Spark 3.2.0 as our > next release, with a release candidate likely to come soon. Spark 3.2 > includes a new Pandas API for Apache Spark based on the Koalas project” > > > Are you referring to what version of Koa

ASF board report draft for August

2021-08-09 Thread Matei Zaharia
It’s time for our quarterly report to the ASF board, which we need to send out this Wednesday. I wrote the draft below based on community activity — let me know if you’d like to add or change anything: == Description: Apache Spark is a fast and general

ASF board report draft for May

2021-05-10 Thread Matei Zaharia
It’s time for our quarterly report to the ASF board, which we need to submit on Wednesday. I’ve put together the following draft based on activity in the community — let me know if you’d like to add or change anything: == Description: Apache Spark is a fast

Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-28 Thread Matei Zaharia
+1 Matei > On Mar 28, 2021, at 1:45 AM, Gengliang Wang wrote: > > +1 (non-binding) > > On Sun, Mar 28, 2021 at 11:12 AM Mridul Muralidharan > wrote: > +1 > > Regards, > Mridul > > On Sat, Mar 27, 2021 at 6:09 PM Xiao Li > wrote: > +1

Welcoming six new Apache Spark committers

2021-03-26 Thread Matei Zaharia
Hi all, The Spark PMC recently voted to add several new committers. Please join me in welcoming them to their new role! Our new committers are: - Maciej Szymkiewicz (contributor to PySpark) - Max Gekk (contributor to Spark SQL) - Kent Yao (contributor to Spark SQL) - Attila Zsolt Piros

ASF board report for February 2021

2021-02-08 Thread Matei Zaharia
It’s time to prepare our quarterly ASF board report, which we need to submit on Feb 10th. The last one was in November. I’ve written a draft here, but let me know if you want to add any more content that I’ve missed. == Apache Spark is a fast and general engine for large-scale

Draft ASF board report for November

2020-11-10 Thread Matei Zaharia
Hi all, It’s time to send in our quarterly ASF board report on Nov 11, so I wanted to include anything notable going on that we want to appear in the board archive. Here is my draft; let me know if you have suggested changes. === Apache Spark is a

Re: [VOTE] Standardize Spark Exception Messages SPIP

2020-11-05 Thread Matei Zaharia
+1 Matei > On Nov 5, 2020, at 10:25 AM, EveLiao wrote: > > +1 > Thanks! > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >

ASF board report draft for August

2020-08-10 Thread Matei Zaharia
Hi all, Our quarterly project board report needs to be submitted on August 12th, and I wanted to include anything notable going on that we want to appear in the board archive. Here is my draft below; let me know if you have suggested changes. ===

Welcoming some new Apache Spark committers

2020-07-14 Thread Matei Zaharia
Hi all, The Spark PMC recently voted to add several new committers. Please join me in welcoming them to their new roles! The new committers are: - Huaxin Gao - Jungtaek Lim - Dilip Biswal All three of them contributed to Spark 3.0 and we’re excited to have them join the project. Matei and

Re: Removing references to slave (and maybe in the future master)

2020-06-18 Thread Matei Zaharia
Yup, it would be great to do this. FWIW, I would propose using “worker” everywhere instead unless it already means something in that context, just to have a single word for this (instead of multiple words such as agent, replica, etc), but I haven’t looked into whether that would make anything

Re: [vote] Apache Spark 3.0 RC3

2020-06-09 Thread Matei Zaharia
gt; > +1: > Reynold Xin (binding) > Prashant Sharma (binding) > Gengliang Wang > Sean Owen (binding) > Mridul Muralidharan (binding) > Takeshi Yamamuro > Maxim Gekk > Matei Zaharia (binding) > Jungtaek Lim > Denny Lee > Russell Spitzer > Dongjoon Hyun (binding

Re: [vote] Apache Spark 3.0 RC3

2020-06-07 Thread Matei Zaharia
+1 Matei > On Jun 7, 2020, at 6:53 AM, Maxim Gekk wrote: > > +1 (non-binding) > > On Sun, Jun 7, 2020 at 2:34 PM Takeshi Yamamuro > wrote: > +1 (non-binding) > > I don't see any ongoing PR to fix critical bugs in my area. > Bests, > Takeshi > > On Sun, Jun 7,

ASF board report draft for May

2020-05-11 Thread Matei Zaharia
Hi all, Our quarterly project board report needs to be submitted on May 13th, and I wanted to include anything notable going on that we want to appear in the board archive. Here is my draft below — let me know if you have suggested changes. ===

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Matei Zaharia
+1 as well. Matei > On Mar 9, 2020, at 12:05 AM, Wenchen Fan wrote: > > +1 (binding), assuming that this is for public stable APIs, not APIs that are > marked as unstable, evolving, etc. > > On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía > wrote: > +1 (non-binding) >

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-02-27 Thread Matei Zaharia
+1 on this new rubric. It definitely captures the issues I’ve seen in Spark and in other projects. If we write down this rubric (or something like it), it will also be easier to refer to it during code reviews or in proposals of new APIs (we could ask “do you expect to have to change this API

ASF board report draft for February

2020-02-09 Thread Matei Zaharia
Hi all, Our project board report needs to be submitted on Feb 12th, and I wanted to include anything notable going on that we want to appear in the board archive. Here is my draft below — let me know if you have suggestions to add or change things.

Re: Why Apache Spark doesn't use Calcite?

2020-01-13 Thread Matei Zaharia
I’m pretty sure that Catalyst was built before Calcite, or at least in parallel. Calcite 1.0 was only released in 2015. From a technical standpoint, building Catalyst in Scala also made it more concise and easier to extend than an optimizer written in Java (you can find various presentations

Re: Spark 3.0 preview release 2?

2019-12-09 Thread Matei Zaharia
Yup, it would be great to release these more often. > On Dec 9, 2019, at 4:25 PM, Takeshi Yamamuro wrote: > > +1; Looks great if we can in terms of user's feedbacks. > > Bests, > Takeshi > > On Tue, Dec 10, 2019 at 3:14 AM Dongjoon Hyun > wrote: > Thank you,

Re: ASF board report for November 2019

2019-11-12 Thread Matei Zaharia
Oops, sorry about the typo there; I’ll correct that. > On Nov 12, 2019, at 12:43 AM, ruifengz wrote: > > nit: Ruifeng Zhang as committers in the past three months. <- Ruifeng Zheng > > ☺Thanks > > On 11/12/19 3:54 PM, Matei Zaharia wrote: >> Good catch, thanks.

Re: ASF board report for November 2019

2019-11-11 Thread Matei Zaharia
Good catch, thanks. > On Nov 11, 2019, at 6:46 PM, Jungtaek Lim > wrote: > > nit: - The latest committer was added on Sept 4th, 2019 (Dongjoon Hyun). <= > s/committer/PMC member > > Thanks, > Jungtaek Lim (HeartSaVioR) > > On Tue, Nov 12, 2019 at 11:38 AM

ASF board report for November 2019

2019-11-11 Thread Matei Zaharia
Hi all, It’s time to send our quarterly report to the ASF board. Here is my draft — please feel free to suggest any changes. Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a

Re: [DISCUSS] Spark 2.5 release

2019-09-21 Thread Matei Zaharia
If the goal is to get people to try the DSv2 API and build DSv2 data sources, can we recommend the 3.0-preview release for this? That would get people shifting to 3.0 faster, which is probably better overall compared to maintaining two major versions. There’s not that much else changing in 3.0

Welcoming some new committers and PMC members

2019-09-09 Thread Matei Zaharia
Hi all, The Spark PMC recently voted to add several new committers and one PMC member. Join me in welcoming them to their new roles! New PMC member: Dongjoon Hyun New committers: Ryan Blue, Liang-Chi Hsieh, Gengliang Wang, Yuming Wang, Weichen Xu, Ruifeng Zheng The new committers cover lots

Re: JDK11 Support in Apache Spark

2019-08-26 Thread Matei Zaharia
+1, it’s super messy without that. But great to see this running! > On Aug 26, 2019, at 10:53 AM, Reynold Xin wrote: > > Exactly - I think it's important to be able to create a single binary build. > Otherwise downstream users (the 99.99% won't be building their own Spark but > just pull it

ASF board report draft for August

2019-08-12 Thread Matei Zaharia
Hi all, It’s time to submit our quarterly report to the ASF board again this Wednesday. Here is my draft about what’s new — feel free to suggest changes. Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and

ASF board report for May

2019-05-06 Thread Matei Zaharia
It’s time to submit Spark's quarterly ASF board report on May 15th, so I wanted to run the report by everyone to make sure we’re not missing something. Let me know whether I missed anything: Apache Spark is a fast and general engine for large-scale data processing. It offers

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-23 Thread Matei Zaharia
te a 1.0 > release. Adding the Arrow dev list to CC. > > Bryan > > On Sat, Apr 20, 2019 at 5:25 PM Matei Zaharia wrote: > Okay, that makes sense, but is the Arrow data format stable? If not, we risk > breakage when Arrow changes in the future and some libraries using this

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-23 Thread Matei Zaharia
te a 1.0 > release. Adding the Arrow dev list to CC. > > Bryan > > On Sat, Apr 20, 2019 at 5:25 PM Matei Zaharia wrote: > Okay, that makes sense, but is the Arrow data format stable? If not, we risk > breakage when Arrow changes in the future and some libraries using this

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Matei Zaharia
the JIRA description clearly enough when we adjusted it during the > discussion on the JIRA. As part of the columnar processing, we plan on > providing arrow formatted data, but that will be exposed through a Spark > owned API. > > On Sat, Apr 20, 2019 at 1:03 PM Matei Zaharia

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Matei Zaharia
FYI, I’d also be concerned about exposing the Arrow API or format as a public API if it’s not yet stable. Is stabilization of the API and format coming soon on the roadmap there? Maybe someone can work with the Arrow community to make that happen. We’ve been bitten lots of times by API changes

Re: [DISCUSS] Spark 3.0 and DataSourceV2

2019-02-26 Thread Matei Zaharia
e that for Spark > 4? > > > > > -Matt Cheah? > > > > From: Ryan Blue > Reply-To: "rb...@netflix.com" > Date: Tuesday, February 26, 2019 at 4:53 PM > To: Matt Cheah > Cc: Sean Owen , Wenchen Fan , Xiao Li > , Matei Zaha

Re: [DISCUSS] Spark 3.0 and DataSourceV2

2019-02-21 Thread Matei Zaharia
How large would the delay be? My 2 cents are that there’s nothing stopping us from making feature releases more often if we want to, so we shouldn’t see this as an “either delay 3.0 or release in >6 months” decision. If the work is likely to get in with a small delay and simplifies our work

ASF board report for February

2019-02-09 Thread Matei Zaharia
It’s time to submit Spark's quarterly ASF board report on February 13th, so I wanted to run the report by everyone to make sure we’re not missing something. Let me know whether I missed anything: Apache Spark is a fast and general engine for large-scale data processing. It

Re: Automated formatting

2018-11-22 Thread Matei Zaharia
Can we start by just recommending to contributors that they do this manually? Then if it seems to work fine, we can try to automate it. > On Nov 22, 2018, at 4:40 PM, Cody Koeninger wrote: > > I believe scalafmt only works on scala sources. There are a few > plugins for formatting java

ASF board report for November

2018-11-11 Thread Matei Zaharia
It’s time to submit Spark's quarterly ASF board report on November 14th, so I wanted to run the text by everyone to make sure we’re not missing something. Let me know whether I missed anything: Apache Spark is a fast and general engine for large-scale data processing. It

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-08 Thread Matei Zaharia
t; > > > > > > > From: Sean Owen > > Sent: Tuesday, November 6, 2018 10:51 AM > > To: Shivaram Venkataraman > > Cc: Felix Cheung; Wenchen Fan; Matei Zaharia; dev > > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-06 Thread Matei Zaharia
Maybe it’s wroth contacting the CRAN maintainers to ask for help? Perhaps we aren’t disabling it correctly, or perhaps they can ignore this specific failure. +Shivaram who might have some ideas. Matei > On Nov 5, 2018, at 9:09 PM, Felix Cheung wrote: > > I don¡Št know what the cause is yet.

Re: Spark 2.4.0 artifact in Maven repository

2018-11-06 Thread Matei Zaharia
Hi Bartosz, This is because the vote on 2.4 has passed (you can see the vote thread on the dev mailing list) and we are just working to get the release into various channels (Maven, PyPI, etc), which can take some time. Expect to see an announcement soon once that’s done. Matei > On Nov 4,

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Matei Zaharia
s prefer to get that notification sooner > rather than later? > > On Mon, Sep 17, 2018 at 12:58 PM Matei Zaharia > wrote: > I’d like to understand the maintenance burden of Python 2 before deprecating > it. Since it is not EOL yet, it might make sense to only deprecate it once > it’s

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Matei Zaharia
s prefer to get that notification sooner > rather than later? > > On Mon, Sep 17, 2018 at 12:58 PM Matei Zaharia > wrote: > I’d like to understand the maintenance burden of Python 2 before deprecating > it. Since it is not EOL yet, it might make sense to only deprecate it once > it’s

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Matei Zaharia
I’d like to understand the maintenance burden of Python 2 before deprecating it. Since it is not EOL yet, it might make sense to only deprecate it once it’s EOL (which is still over a year from now). Supporting Python 2+3 seems less burdensome than supporting, say, multiple Scala versions in

Re: Should python-2 be supported in Spark 3.0?

2018-09-17 Thread Matei Zaharia
I’d like to understand the maintenance burden of Python 2 before deprecating it. Since it is not EOL yet, it might make sense to only deprecate it once it’s EOL (which is still over a year from now). Supporting Python 2+3 seems less burdensome than supporting, say, multiple Scala versions in

Re: Is there any open source framework that converts Cypher to SparkSQL?

2018-09-16 Thread Matei Zaharia
GraphFrames (https://graphframes.github.io) offers a Cypher-like syntax that then executes on Spark SQL. > On Sep 14, 2018, at 2:42 AM, kant kodali wrote: > > Hi All, > > Is there any open source framework that converts Cypher to SparkSQL? > > Thanks!

Re: Python friendly API for Spark 3.0

2018-09-16 Thread Matei Zaharia
My 2 cents on this is that the biggest room for improvement in Python is similarity to Pandas. We already made the Python DataFrame API different from Scala/Java in some respects, but if there’s anything we can do to make it more obvious to Pandas users, that will help the most. The other issue

Re: time for Apache Spark 3.0?

2018-09-06 Thread Matei Zaharia
Yes, you can start with Unstable and move to Evolving and Stable when needed. We’ve definitely had experimental features that changed across maintenance releases when they were well-isolated. If your change risks breaking stuff in stable components of Spark though, then it probably won’t be

Re: Nightly Builds in the docs (in spark-nightly/spark-master-bin/latest? Can't seem to find it)

2018-08-31 Thread Matei Zaharia
If we actually build stuff nightly in Jenkins, it wouldn’t hurt to publish them IMO. It helps more people try master and test it. > On Aug 31, 2018, at 1:28 PM, Sean Owen wrote: > > There are some builds there, but they're not recent: > > https://people.apache.org/~pwendell/spark-nightly/ >

Re: [discuss] replacing SPIP template with Heilmeier's Catechism?

2018-08-31 Thread Matei Zaharia
I like this as well. Regarding “cost”, I think the equivalent concept for us is impact on the rest of the project (say maintenance cost down the line or whatever). This could be captured in the “risks” too, but it’s a slightly different concept. We should probably just clarify what we mean with

Re: [MLlib][Test] Smoke and Metamorphic Testing of MLlib

2018-08-23 Thread Matei Zaharia
ot;must" for a potentially productive use by projects. Relaxing > the assertions to only react if the differences are significant would be > another possible change. This could be a command line option to allow > different strictness of testing. > > Best, > Steffen > > > Am

Re: Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Matei Zaharia
There’s already a code style guide listed on http://spark.apache.org/contributing.html. Maybe it’s the same? We should decide which one we actually want and update this page if it’s wrong. Matei > On Aug 23, 2018, at 6:33 PM, Sean Owen wrote: > > Seems OK to me. The style is pretty standard

Re: [MLlib][Test] Smoke and Metamorphic Testing of MLlib

2018-08-22 Thread Matei Zaharia
Hi Steffen, Thanks for sharing your results about MLlib — this sounds like a useful tool. However, I wanted to point out that some of the results may be expected for certain machine learning algorithms, so it might be good to design those tests with that in mind. For example: > - The

Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Matei Zaharia
I think that traditionally, the reason *not* to include these has been if they brought additional dependencies that users don’t really need, but that might clash with what the users have in their own app. Maybe this used to be the case for Kafka. We could analyze it and include it by default,

Re: Revisiting Online serving of Spark models?

2018-07-03 Thread Matei Zaharia
Just wondering, is there an update on this? I haven’t seen a summary of the offline discussion but maybe I’ve missed it. Matei > On Jun 11, 2018, at 8:51 PM, Holden Karau wrote: > > So I kicked of a thread on user@ to collect people's feedback there but I'll > summarize the offline results

Re: Beam's recent community development work

2018-07-02 Thread Matei Zaharia
I think telling people that they’re being considered as committers early on is a good idea, but AFAIK we’ve always had individual committers do that with contributors who were doing great work in various areas. We don’t have a centralized process for it though — it’s up to whoever wants to work

Re: how can I run spark job in my environment which is a single Ubuntu host with no hadoop installed

2018-06-17 Thread Matei Zaharia
Maybe your application is overriding the master variable when it creates its SparkContext. I see you are still passing “yarn-client” as an argument later to it in your command. > On Jun 17, 2018, at 11:53 AM, Raymond Xie wrote: > > Thank you Subhash. > > Here is the new command: >

Re: time for Apache Spark 3.0?

2018-04-05 Thread Matei Zaharia
data sources, ML algorithms, etc developed outside our source tree would have to do the same thing we do internally. > On Apr 5, 2018, at 10:30 AM, Matei Zaharia <matei.zaha...@gmail.com> wrote: > > Sorry, but just to be clear here, this is the 2.12 API issue: > https://issues

Re: time for Apache Spark 3.0?

2018-04-05 Thread Matei Zaharia
Sorry, but just to be clear here, this is the 2.12 API issue: https://issues.apache.org/jira/browse/SPARK-14643, with more details in this doc: https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit. Basically, if we are allowed to change Spark’s API a little to

Re: time for Apache Spark 3.0?

2018-04-05 Thread Matei Zaharia
Java 9/10 support would be great to add as well. Regarding Scala 2.12, I thought that supporting it would become easier if we change the Spark API and ABI slightly. Basically, it is of course possible to create an alternate source tree today, but it might be possible to share the same source

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Matei Zaharia
Welcome, Zhenhua! Matei > On Apr 1, 2018, at 10:28 PM, Wenchen Fan wrote: > > Hi all, > > The Spark PMC recently added Zhenhua Wang as a committer on the project. > Zhenhua is the major contributor of the CBO project, and has been > contributing across several areas of

Welcoming some new committers

2018-03-02 Thread Matei Zaharia
Hi everyone, The Spark PMC has recently voted to add several new committers to the project, based on their contributions to Spark 2.3 and other past work: - Anirudh Ramanathan (contributor to Kubernetes support) - Bryan Cutler (contributor to PySpark and Arrow support) - Cody Koeninger

Re: Please keep s3://spark-related-packages/ alive

2018-02-27 Thread Matei Zaharia
For Flintrock, have you considered using a Requester Pays bucket? That way you’d get the availability of S3 without having to foot the bill for bandwidth yourself (which was the bulk of the cost for the old bucket). Matei > On Feb 27, 2018, at 4:35 PM, Nicholas Chammas

Re: Spark 3

2018-01-20 Thread Matei Zaharia
We should only make breaking changes when we have a strong reason to do so — otherwise, it’s fine to stay on 2.x for a while. For example, maybe there’s a way to support Hadoop 3.0 from Spark 2.x as well. So far, none of the JIRAs targeting 3.0 seem that compelling, though I could be missing

Re: Timeline for Spark 2.3

2017-11-09 Thread Matei Zaharia
I’m also +1 on extending this to get Kubernetes and other features in. Matei > On Nov 9, 2017, at 4:04 PM, Anirudh Ramanathan > wrote: > > This would help the community on the Kubernetes effort quite a bit - giving > us additional time for reviews and testing for

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-03 Thread Matei Zaharia
+1 from me too. Matei > On Nov 3, 2017, at 4:59 PM, Wenchen Fan wrote: > > +1. > > I think this architecture makes a lot of sense to let executors talk to > source/sink directly, and bring very low latency. > > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen

Re: Spark 1.x - End of life

2017-10-19 Thread Matei Zaharia
Hi Ismael, It depends on what you mean by “support”. In general, there won’t be new feature releases for 1.X (e.g. Spark 1.7) because all the new features are being added to the master branch. However, there is always room for bug fix releases if there is a catastrophic bug, and committers can

Welcoming Tejas Patil as a Spark committer

2017-09-29 Thread Matei Zaharia
Hi all, The Spark PMC recently added Tejas Patil as a committer on the project. Tejas has been contributing across several areas of Spark for a while, focusing especially on scalability issues and SQL. Please join me in welcoming Tejas! Matei

Re: [VOTE][SPIP] SPARK-21866 Image support in Apache Spark

2017-09-23 Thread Matei Zaharia
+1; we should consider something similar for multi-dimensional tensors too. Matei > On Sep 23, 2017, at 7:27 AM, Yanbo Liang wrote: > > +1 > > On Sat, Sep 23, 2017 at 7:08 PM, Noman Khan wrote: > +1 > > Regards > Noman > From: Denny Lee

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Matei Zaharia
+1 (binding) > On Sep 11, 2017, at 5:54 PM, Hyukjin Kwon wrote: > > +1 (non-binding) > > > 2017-09-12 9:52 GMT+09:00 Yin Huai : > +1 > > On Mon, Sep 11, 2017 at 5:47 PM, Sameer Agarwal wrote: > +1 (non-binding) > > On Thu,

[jira] [Commented] (SPARK-21866) SPIP: Image support in Spark

2017-09-04 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152915#comment-16152915 ] Matei Zaharia commented on SPARK-21866: --- Just to chime in on this, I've also seen feedback

Re: Moving Scala 2.12 forward one step

2017-09-01 Thread Matei Zaharia
profile, > maybe, or adding the 2.12 REPL). Anything that is breaking, we can work on in > a series of open PRs, or maybe a branch, yea. It's unusual but might be > worthwhile. > > On Fri, Sep 1, 2017 at 9:44 AM Matei Zaharia <matei.zaha...@gmail.com> wrote: > If the change

Re: Moving Scala 2.12 forward one step

2017-09-01 Thread Matei Zaharia
If the changes aren’t that hard, I think we should also consider building a Scala 2.12 version of Spark 2.3 in a separate branch. I’ve definitely seen concerns from some large Scala users that Spark isn’t supporting 2.12 soon enough. I thought SPARK-14220 was blocked mainly because the changes

[jira] [Updated] (SPARK-18278) SPIP: Support native submission of spark jobs to a kubernetes cluster

2017-08-29 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-18278: -- Labels: SPIP (was: ) > SPIP: Support native submission of spark jobs to a kubernetes clus

[jira] [Updated] (SPARK-21866) SPIP: Image support in Spark

2017-08-29 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-21866: -- Labels: SPIP (was: ) > SPIP: Image support in Sp

Welcoming Saisai (Jerry) Shao as a committer

2017-08-28 Thread Matei Zaharia
Hi everyone, The PMC recently voted to add Saisai (Jerry) Shao as a committer. Saisai has been contributing to many areas of the project for a long time, so it’s great to see him join. Join me in thanking and congratulating him! Matei

Re: Kill Spark Streaming JOB from Spark UI or Yarn

2017-08-27 Thread Matei Zaharia
The batches should all have the same application ID, so use that one. You can also find the application in the YARN UI to terminate it from there. Matei > On Aug 27, 2017, at 10:27 AM, KhajaAsmath Mohammed > wrote: > > Hi, > > I am new to spark streaming and not

Re: SPIP: Spark on Kubernetes

2017-08-17 Thread Matei Zaharia
+1 from me as well. Matei > On Aug 17, 2017, at 10:55 AM, Reynold Xin wrote: > > +1 on adding Kubernetes support in Spark (as a separate module similar to how > YARN is done) > > I talk with a lot of developers and teams that operate cloud services, and > k8s in the

  1   2   3   4   5   6   7   8   9   10   >