Re: Remove non-Tungsten mode in Spark 3?

2019-01-03 Thread Reynold Xin
The issue with the offheap mode is it is a pretty big behavior change and does 
require additional setup (also for users that run with UDFs that allocate a lot 
of heap memory, it might not be as good).

I can see us removing the legacy mode since it's been legacy for a long time 
and perhaps very few users need it. How much code does it remove though?

On Thu, Jan 03, 2019 at 2:55 PM, Sean Owen < sro...@apache.org > wrote:

> 
> 
> 
> Just wondering if there is a good reason to keep around the pre-tungsten
> on-heap memory mode for Spark 3, and make spark.memory.offHeap.enabled
> always true? It would simplify the code somewhat, but I don't feel I'm so
> aware of the tradeoffs.
> 
> 
> 
> I know we didn't deprecate it, but it's been off by default for a long
> time. It could be deprecated, too.
> 
> 
> 
> Same question for spark.memory.useLegacyMode and all its various
> associated settings? Seems like these should go away at some point, and
> Spark 3 is a good point. Same issue about deprecation though.
> 
> 
> 
> - To
> unsubscribe e-mail: dev-unsubscribe@ spark. apache. org (
> dev-unsubscr...@spark.apache.org )
> 
> 
>

Re: Apache Spark 2.2.3 ?

2019-01-03 Thread Dongjoon Hyun
Thank you, Sean!

Bests,
Dongjoon.


On Thu, Jan 3, 2019 at 2:50 PM Sean Owen  wrote:

> Yes, that one's not going to be back-ported to 2.3. I think it's fine to
> proceed with a 2.2 release with what's there now and call it done.
> Note that Spark 2.3 would be EOL around September of this year.
>
> On Thu, Jan 3, 2019 at 2:31 PM Dongjoon Hyun 
> wrote:
>
>> Thank you for additional support for 2.2.3, Felix and Takeshi!
>>
>>
>> The following is the update for Apache Spark 2.2.3 release.
>>
>> For correctness issues, two more patches landed on `branch-2.2`.
>>
>>   SPARK-22951 fix aggregation after dropDuplicates on empty dataframes
>>   SPARK-25591 Avoid overwriting deserialized accumulator
>>
>> Currently, if we use the following JIRA search query, there exist one
>> JIRA issue; SPARK-25206.
>>
>>   Query: project = SPARK AND fixVersion in (2.3.0, 2.3.1, 2.3.2,
>> 2.3.3, 2.4.0, 2.4.1, 3.0.0) AND fixVersion not in (2.2.0, 2.2.1, 2.2.2,
>> 2.2.3) AND affectedVersion in (2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1,
>> 2.2.2, 2.2.3) AND labels in (Correctness, correctness)
>>
>> SPARK-25206 ( https://issues.apache.org/jira/browse/SPARK-25206 ) has
>>
>>   Affected Version: 2.2.2, 2.3.1
>>   Target Versions: 2.3.2, 2.4.0
>>   Fixed Version: 2.4.0
>>
>> Although SPARK-25206 is labeled as a correctness issue, 2.3.2 already
>> missed it due to the technical difficulties and risks. Instead, it's marked
>> as a known issue. As we see, it's not targeted to 2.3.3, too.
>>
>> I know the correctness issue policy on new releases. However, for me,
>> Spark 2.2.3 is a little bit exceptional release since it's a farewell
>> release and branch-2.2 is already EOL and too far from the active branch
>> master.
>>
>> So, I'd like to put SPARK-25206 out of the scope of the farewell release
>> and recommend the users to use the other latest release. For example, Spark
>> 2.4.0 for SPARK-25206.
>>
>> How do you think about that?
>>
>> Bests,
>> Dongjoon.
>>
>>>
>>>


Remove non-Tungsten mode in Spark 3?

2019-01-03 Thread Sean Owen
Just wondering if there is a good reason to keep around the
pre-tungsten on-heap memory mode for Spark 3, and make
spark.memory.offHeap.enabled always true? It would simplify the code
somewhat, but I don't feel I'm so aware of the tradeoffs.

I know we didn't deprecate it, but it's been off by default for a long
time. It could be deprecated, too.

Same question for spark.memory.useLegacyMode and all its various
associated settings? Seems like these should go away at some point,
and Spark 3 is a good point. Same issue about deprecation though.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 2.2.3 ?

2019-01-03 Thread Sean Owen
Yes, that one's not going to be back-ported to 2.3. I think it's fine to
proceed with a 2.2 release with what's there now and call it done.
Note that Spark 2.3 would be EOL around September of this year.

On Thu, Jan 3, 2019 at 2:31 PM Dongjoon Hyun 
wrote:

> Thank you for additional support for 2.2.3, Felix and Takeshi!
>
>
> The following is the update for Apache Spark 2.2.3 release.
>
> For correctness issues, two more patches landed on `branch-2.2`.
>
>   SPARK-22951 fix aggregation after dropDuplicates on empty dataframes
>   SPARK-25591 Avoid overwriting deserialized accumulator
>
> Currently, if we use the following JIRA search query, there exist one JIRA
> issue; SPARK-25206.
>
>   Query: project = SPARK AND fixVersion in (2.3.0, 2.3.1, 2.3.2,
> 2.3.3, 2.4.0, 2.4.1, 3.0.0) AND fixVersion not in (2.2.0, 2.2.1, 2.2.2,
> 2.2.3) AND affectedVersion in (2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1,
> 2.2.2, 2.2.3) AND labels in (Correctness, correctness)
>
> SPARK-25206 ( https://issues.apache.org/jira/browse/SPARK-25206 ) has
>
>   Affected Version: 2.2.2, 2.3.1
>   Target Versions: 2.3.2, 2.4.0
>   Fixed Version: 2.4.0
>
> Although SPARK-25206 is labeled as a correctness issue, 2.3.2 already
> missed it due to the technical difficulties and risks. Instead, it's marked
> as a known issue. As we see, it's not targeted to 2.3.3, too.
>
> I know the correctness issue policy on new releases. However, for me,
> Spark 2.2.3 is a little bit exceptional release since it's a farewell
> release and branch-2.2 is already EOL and too far from the active branch
> master.
>
> So, I'd like to put SPARK-25206 out of the scope of the farewell release
> and recommend the users to use the other latest release. For example, Spark
> 2.4.0 for SPARK-25206.
>
> How do you think about that?
>
> Bests,
> Dongjoon.
>
>>
>>


Re: Apache Spark 2.2.3 ?

2019-01-03 Thread Dongjoon Hyun
Thank you for additional support for 2.2.3, Felix and Takeshi!


The following is the update for Apache Spark 2.2.3 release.

For correctness issues, two more patches landed on `branch-2.2`.

  SPARK-22951 fix aggregation after dropDuplicates on empty dataframes
  SPARK-25591 Avoid overwriting deserialized accumulator

Currently, if we use the following JIRA search query, there exist one JIRA
issue; SPARK-25206.

  Query: project = SPARK AND fixVersion in (2.3.0, 2.3.1, 2.3.2, 2.3.3,
2.4.0, 2.4.1, 3.0.0) AND fixVersion not in (2.2.0, 2.2.1, 2.2.2, 2.2.3) AND
affectedVersion in (2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.2.3)
AND labels in (Correctness, correctness)

SPARK-25206 ( https://issues.apache.org/jira/browse/SPARK-25206 ) has

  Affected Version: 2.2.2, 2.3.1
  Target Versions: 2.3.2, 2.4.0
  Fixed Version: 2.4.0

Although SPARK-25206 is labeled as a correctness issue, 2.3.2 already
missed it due to the technical difficulties and risks. Instead, it's marked
as a known issue. As we see, it's not targeted to 2.3.3, too.

I know the correctness issue policy on new releases. However, for me, Spark
2.2.3 is a little bit exceptional release since it's a farewell release and
branch-2.2 is already EOL and too far from the active branch master.

So, I'd like to put SPARK-25206 out of the scope of the farewell release
and recommend the users to use the other latest release. For example, Spark
2.4.0 for SPARK-25206.

How do you think about that?

Bests,
Dongjoon.


On Thu, Jan 3, 2019 at 12:02 AM Takeshi Yamamuro 
wrote:

> Hi, all, happy new year!
>
> +1 on the release of 2.2.3/2.3.3.
> I checked there is no ongoing issue targeting on 2.3.3, too.
>
> On Thu, Jan 3, 2019 at 8:50 AM Felix Cheung 
> wrote:
>
>> +1 on 2.2.3 of course
>>
>>
>> --
>> *From:* Dongjoon Hyun 
>> *Sent:* Wednesday, January 2, 2019 12:21 PM
>> *To:* Saisai Shao
>> *Cc:* Xiao Li; Felix Cheung; Sean Owen; dev
>> *Subject:* Re: Apache Spark 2.2.3 ?
>>
>> Thank you for swift feedbacks and Happy New Year. :)
>> For 2.2.3 release on next week, I see two positive opinions (including
>> mine)
>> and don't see any direct objections.
>>
>> Apache Spark has a mature, resourceful, and fast-growing community.
>> One of the important characteristic of the mature community is
>> the expectable behavior where the users are able to depend on.
>> For instance, we have a nice tradition to cut the branch as a sign of
>> feature freeze.
>> The *final* release of a branch is not only good for the end users, but
>> also a good sign of the EOL of the branch for all.
>>
>> As a junior committer of the community, I want to contribute to deliver
>> the final 2.2.3 release to the community and to finalize `branch-2.2`.
>>
>> * For Apache Spark JIRA, I checked that there is no on-going issues
>> targeting on `2.2.3`.
>> * For commits, I reviewed the newly landed commits after `2.2.2` tag and
>> updated a few missing JIRA issues accordingly.
>> * Apparently, we can release 2.2.3 next week.
>>
>> BTW, I'm +1 for the next 2.3/2.4 and have been expecting those releases
>> before Spark+AI Summit (April) because we did like that usually.
>> Please send another email to `dev` mailing list because it's worth to
>> receive more attentions and requests.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Tue, Jan 1, 2019 at 9:35 PM Saisai Shao 
>> wrote:
>>
>>> Agreed to have a new branch-2.3 release, as we already accumulated
>>> several fixes.
>>>
>>> Thanks
>>> Saisai
>>>
>>> Xiao Li  于2019年1月2日周三 下午1:32写道:
>>>
 Based on the commit history,
 https://gitbox.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.3
 contains more critical fixes. Maybe the priority is higher?

 On Tue, Jan 1, 2019 at 9:22 PM Felix Cheung 
 wrote:

> Speaking of, it’s been 3 months since 2.3.2... (Sept 2018)
>
> And 2 months since 2.4.0 (Nov 2018) - does the community feel 2.4
> branch is stabilizing?
>
>
> --
> *From:* Sean Owen 
> *Sent:* Tuesday, January 1, 2019 8:30 PM
> *To:* Dongjoon Hyun
> *Cc:* dev
> *Subject:* Re: Apache Spark 2.2.3 ?
>
> I agree with that logic, and if you're volunteering to do the legwork,
> I don't see a reason not to cut a final 2.2 release.
>
> On Tue, Jan 1, 2019 at 9:19 PM Dongjoon Hyun 
> wrote:
> >
> > Hi, All.
> >
> > Apache Spark community has a policy maintaining the feature branch
> for 18 months. I think it's time for the 2.2.3 release since 2.2.0 is
> released on July 2017.
> >
> > http://spark.apache.org/versioning-policy.html
> >
> > After 2.2.2 (July 2018), `branch-2.2` has 40 patches (including
> security patches).
> >
> >
> https://gitbox.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.2
> >
> > If it's okay and there is no further plan on `branch-2.2`, I want to
> volunteer to 

Re:

2019-01-03 Thread northbright
Unsub me 2 pls.



On Thu, 3 Jan 2019 at 15:22, marco rocchi <
rocchi.1407...@studenti.uniroma1.it> wrote:

> Unsubscribe me, please.
>
> Thank you so much
>


[no subject]

2019-01-03 Thread marco rocchi
Unsubscribe me, please.

Thank you so much


Re: SPARk-25299: Updates As Of December 19, 2018

2019-01-03 Thread Peter Rudenko
Hi Matt, i'm a developer of SparkRDMA shuffle manager:
https://github.com/Mellanox/SparkRDMA
Thanks for your effort on improving Spark Shuffle API. We are very
interested in participating in this. Have for now several comments:
1. Went through these 4 documents:

https://docs.google.com/document/d/1tglSkfblFhugcjFXZOxuKsCdxfrHBXfxgTs-sbbNB3c/edit#


https://docs.google.com/document/d/1TA-gDw3ophy-gSu2IAW_5IMbRK_8pWBeXJwngN9YB80/edit

https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40

https://docs.google.com/document/d/1kSpbBB-sDk41LeORm3-Hfr-up98Ozm5wskvB49tUhSs/edit#

As i understood there's 2 discussions: improving shuffle manager API itself
(Splash manager) and improving external shuffle service

2. We may consider to revisiting SPIP: RDMA Accelerated Shuffle Engine
 whether to support RDMA
in the main codebase or at least as a first-class shuffle plugin (there are
not much other open source shuffle plugins exists). We actively develop it,
adding new features. RDMA is now available on Azure (
https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/),
Alibaba  and other cloud providers. For now we support only memory <->
memory transfer, but rdma is extensible to NVM and GPU data transfer.
3. We have users that are interested in having this feature (
https://issues.apache.org/jira/browse/SPARK-12196) - we can consider adding
it to this new API.

Let me know if you need help in review / testing / benchmark.
I'll look more on documents and PR,

Thanks,
Peter Rudenko
Software engineer at Mellanox Technologies.


ср, 19 груд. 2018 о 20:54 John Zhuge  пише:

> Matt, appreciate the update!
>
> On Wed, Dec 19, 2018 at 10:51 AM Matt Cheah  wrote:
>
>> Hi everyone,
>>
>>
>>
>> Earlier this year, we proposed SPARK-25299
>> , proposing the idea
>> of using other storage systems for persisting shuffle files. Since that
>> time, we have been continuing to work on prototypes for this project. In
>> the interest of increasing transparency into our work, we have created a 
>> progress
>> report document
>> 
>> where you may find a summary of the work we have been doing, as well as
>> links to our prototypes on Github. We would ask that anyone who is very
>> familiar with the inner workings of Spark’s shuffle could provide feedback
>> and comments on our work thus far. We welcome any further discussion in
>> this space. You may comment in this e-mail thread or by commenting on the
>> progress report document.
>>
>>
>>
>> Looking forward to hearing from you. Thanks,
>>
>>
>>
>> -Matt Cheah
>>
>
>
> --
> John
>


Re: Apache Spark 2.2.3 ?

2019-01-03 Thread Takeshi Yamamuro
Hi, all, happy new year!

+1 on the release of 2.2.3/2.3.3.
I checked there is no ongoing issue targeting on 2.3.3, too.

On Thu, Jan 3, 2019 at 8:50 AM Felix Cheung 
wrote:

> +1 on 2.2.3 of course
>
>
> --
> *From:* Dongjoon Hyun 
> *Sent:* Wednesday, January 2, 2019 12:21 PM
> *To:* Saisai Shao
> *Cc:* Xiao Li; Felix Cheung; Sean Owen; dev
> *Subject:* Re: Apache Spark 2.2.3 ?
>
> Thank you for swift feedbacks and Happy New Year. :)
> For 2.2.3 release on next week, I see two positive opinions (including
> mine)
> and don't see any direct objections.
>
> Apache Spark has a mature, resourceful, and fast-growing community.
> One of the important characteristic of the mature community is
> the expectable behavior where the users are able to depend on.
> For instance, we have a nice tradition to cut the branch as a sign of
> feature freeze.
> The *final* release of a branch is not only good for the end users, but
> also a good sign of the EOL of the branch for all.
>
> As a junior committer of the community, I want to contribute to deliver
> the final 2.2.3 release to the community and to finalize `branch-2.2`.
>
> * For Apache Spark JIRA, I checked that there is no on-going issues
> targeting on `2.2.3`.
> * For commits, I reviewed the newly landed commits after `2.2.2` tag and
> updated a few missing JIRA issues accordingly.
> * Apparently, we can release 2.2.3 next week.
>
> BTW, I'm +1 for the next 2.3/2.4 and have been expecting those releases
> before Spark+AI Summit (April) because we did like that usually.
> Please send another email to `dev` mailing list because it's worth to
> receive more attentions and requests.
>
> Bests,
> Dongjoon.
>
>
> On Tue, Jan 1, 2019 at 9:35 PM Saisai Shao  wrote:
>
>> Agreed to have a new branch-2.3 release, as we already accumulated
>> several fixes.
>>
>> Thanks
>> Saisai
>>
>> Xiao Li  于2019年1月2日周三 下午1:32写道:
>>
>>> Based on the commit history,
>>> https://gitbox.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.3
>>> contains more critical fixes. Maybe the priority is higher?
>>>
>>> On Tue, Jan 1, 2019 at 9:22 PM Felix Cheung 
>>> wrote:
>>>
 Speaking of, it’s been 3 months since 2.3.2... (Sept 2018)

 And 2 months since 2.4.0 (Nov 2018) - does the community feel 2.4
 branch is stabilizing?


 --
 *From:* Sean Owen 
 *Sent:* Tuesday, January 1, 2019 8:30 PM
 *To:* Dongjoon Hyun
 *Cc:* dev
 *Subject:* Re: Apache Spark 2.2.3 ?

 I agree with that logic, and if you're volunteering to do the legwork,
 I don't see a reason not to cut a final 2.2 release.

 On Tue, Jan 1, 2019 at 9:19 PM Dongjoon Hyun 
 wrote:
 >
 > Hi, All.
 >
 > Apache Spark community has a policy maintaining the feature branch
 for 18 months. I think it's time for the 2.2.3 release since 2.2.0 is
 released on July 2017.
 >
 > http://spark.apache.org/versioning-policy.html
 >
 > After 2.2.2 (July 2018), `branch-2.2` has 40 patches (including
 security patches).
 >
 >
 https://gitbox.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.2
 >
 > If it's okay and there is no further plan on `branch-2.2`, I want to
 volunteer to prepare the first RC (early next week?).
 >
 > Please let me know your opinions about this.
 >
 > Bests,
 > Dongjoon.

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>>
>>> --
>>> [image:
>>> https://databricks.com/sparkaisummit/north-america?utm_source=email_medium=signature]
>>>
>>

-- 
---
Takeshi Yamamuro