Hi Kannan,
As I know the shuffle Id in ShuffleDependency will be increased, so even if
you run the same job twice, the shuffle dependency as well as shuffle id is
different, so the shuffle file name which is combined by
(shuffleId+mapId+reduceId) will be changed, so there's no name conflict
even
the same.
--
Kannan
On Tue, Mar 24, 2015 at 7:35 PM, Saisai Shao sai.sai.s...@gmail.com
wrote:
Hi Kannan,
As I know the shuffle Id in ShuffleDependency will be increased, so even
if you run the same job twice, the shuffle dependency as well as
shuffle id
is different, so
. On the other hand,
DiskBlockManager.getFile is used to create the shuffle index and data file.
--
Kannan
On Tue, Mar 24, 2015 at 11:56 PM, Saisai Shao sai.sai.s...@gmail.com
wrote:
Yes as Josh said, when application is started, Spark will create a unique
application-wide folder
Hi Hrishikesh,
Now we add Kafka unit test for python which relies on Kafka assembly jar,
so you need to run `sbt assembly` or mvn `package` at first to get an
assemble jar.
2015-04-22 1:15 GMT+08:00 Marcelo Vanzin van...@cloudera.com:
On Tue, Apr 21, 2015 at 1:30 AM, Hrishikesh Subramonian
April 2015 07:38 AM, Saisai Shao wrote:
Hi Hrishikesh,
Now we add Kafka unit test for python which relies on Kafka assembly
jar, so you need to run `sbt assembly` or mvn `package` at first to get an
assemble jar.
2015-04-22 1:15 GMT+08:00 Marcelo Vanzin van...@cloudera.com:
On Tue, Apr
.
Spark does not support any state persistence across deployments so this is
something we need to handle on our own.
Hope that helps. Let me know if not.
Thanks!
Amit
On Thu, Jun 11, 2015 at 10:02 PM, Saisai Shao sai.sai.s...@gmail.com
wrote:
Hi,
What is your meaning of getting
a...@yelp.com:
Thanks, Jerry. That's what I suspected based on the code I looked at. Any
pointers on what is needed to build in this support would be great. This is
critical to the project we are currently working on.
Thanks!
On Thu, Jun 11, 2015 at 10:54 PM, Saisai Shao sai.sai.s
Hi,
What is your meaning of getting the offsets from the RDD, from my
understanding, the offsetRange is a parameter you offered to KafkaRDD, why
do you still want to get the one previous you set into?
Thanks
Jerry
2015-06-12 12:36 GMT+08:00 Amit Ramesh a...@yelp.com:
Congratulations on the
Kafka now build-in supports managing metadata itself besides ZK, it is easy
to use and change from current ZK implementation. I think here the problem
is do we need to manage offset in Spark Streaming level or leave this
question to user.
If you want to manage offset in user level, letting Spark
d this is when i have not enabled
> Dynamic allocation. My cluster has other DN's available, AM should request
> the killed executors from yarn, and get it on some other DN's.
>
> Regards,
> Prakhar
>
>
> On Mon, Oct 19, 2015 at 2:47 PM, Saisai Shao <sai.sai.s...@gmail.co
SPARK-6470 only supports node label expression for executors.
SPARK-7173 supports node label expression for AM (will be in 1.6).
If you want to schedule your whole application through label expression,
you have to configure both am and executor label expression. If you only
want to schedule
zzq98...@alibaba-inc.com]
> *发送时间:* 2015年12月16日 9:21
> *收件人:* 'Ted Yu'
> *抄送:* 'Saisai Shao'; 'dev'
> *主题:* Re: spark with label nodes in yarn
>
>
>
> Oops...
>
>
>
> I do use spark 1.5.0 and apache hadoop 2.6.0 (spark 1.4.1 + apache hadoop
> 2.6.0 is
Might be related to this JIRA (
https://issues.apache.org/jira/browse/SPARK-11761), not very sure about it.
On Fri, Nov 27, 2015 at 10:22 AM, Nan Zhu wrote:
> Hi, all
>
> Anyone noticed that some of the tests just blocked at the test case “don't
> call ssc.stop in
+1.
Hadoop 2.6 would be a good choice with many features added (like supporting
long running service, label based scheduling). Currently there's lot of
reflection codes to support multiple version of Yarn, so upgrading to a
newer version will really ease the pain :).
Thanks
Saisai
On Fri, Nov
I think it is due to our recent changes to override the external resolvers
in sbt building profile, I just created a JIRA (
https://issues.apache.org/jira/browse/SPARK-13109) to track this.
On Mon, Feb 1, 2016 at 3:01 PM, Mike Hynes <91m...@gmail.com> wrote:
> Hi devs,
>
> I used to be able to
Yes, we need to fix the document.
On Tue, Mar 8, 2016 at 9:07 AM, Mark Hamstra
wrote:
> Yes, it works in standalone mode.
>
> On Mon, Mar 7, 2016 at 4:25 PM, Eugene Morozov > wrote:
>
>> Hi, the feature looks like the one I'd like to use,
Hi Michael, shuffle data (mapper output) have to be materialized into disk
finally, no matter how large memory you have, it is the design purpose of
Spark. In you scenario, since you have a big memory, shuffle spill should
not happen frequently, most of the disk IO you see might be final shuffle
eliminate this.
>
>
> On Fri, Apr 1, 2016, 7:25 PM Saisai Shao <sai.sai.s...@gmail.com> wrote:
>
>> Hi Michael, shuffle data (mapper output) have to be materialized into
>> disk finally, no matter how large memory you have, it is the design purpose
>> of Spark. In you scenari
Quite curious about the benefits of using HDFS as shuffle service, also
what's the problem of using current shuffle service?
Thanks
Saisai
On Wed, Apr 27, 2016 at 4:31 AM, Timothy Chen wrote:
> Are you suggesting to have shuffle service persist and fetch data with
> hdfs,
>From my understanding I think newAPIHadoopFile or hadoopFIle is generic
enough for you to support any InputFormat you wanted. IMO it is not so
necessary to add a new API for this.
On Fri, May 20, 2016 at 12:59 AM, Alexander Pivovarov
wrote:
> Spark users might not know
0 partitions by 256MB than RDD with
> 250,000+ partition all different sizes from 100KB to 128MB
>
> So, I see only advantages if sc.textFile() starts using CombineTextInputFormat
> instead of TextInputFormat
>
> Alex
>
> On Thu, May 19, 2016 at 8:30 PM, Saisai Shao <s
I think it is by design FileInputDStream doesn't support report info,
because FileInputDStream doesn't have event/record concept (it is file
based), so it is hard to define how to correctly report the input info.
Current input info reporting can be supported for all receiver based
InputDStream
It would be better to have a specific technical reason why this PR should
be closed, either the implementation is not good or the problem is not
valid, or something else. That will actually help the contributor to shape
their codes and reopen the PR again. Otherwise reasons like "feel free to
+1, HBaseTest in Spark Example is quite old and obsolete, the HBase
connector in HBase repo has evolved a lot, it would be better to guide user
to refer to that not here in Spark example. So good to remove it.
Thanks
Saisai
On Wed, Apr 20, 2016 at 1:41 AM, Josh Rosen
mment is
>>> added at the last without responses from the author?
>>>
>>>
>>> IMHO, If the committers are not sure whether the patch would be useful,
>>> then I think they should leave some comments why they are not sure, not
>>> just ignorin
Use dominant resource calculator instead of default resource calculator
will get the expected vcores as you wanted. Basically by default yarn does
not honor cpu cores as resource, so you will always see vcore is 1 no
matter what number of cores you set in spark.
On Wed, Aug 3, 2016 at 12:11 PM,
This archive contains all the jars required by Spark runtime, you could zip
all the jars under /jars and upload this archive to HDFS, then
configure spark.yarn.archive with the path of this archive on HDFS.
On Sun, Aug 28, 2016 at 9:59 PM, Srikanth Sampath wrote:
>
As I remembered using Spark 2.1 Driver to communicate with Spark 2.2
executors will throw some RPC exceptions (I don't remember the details of
exception).
On Thu, Aug 10, 2017 at 4:23 PM, Ted Yu wrote:
> Hi,
> Has anyone used Spark 2.1.x client with Spark 2.2.0 cluster ?
>
There is a JIRA about this thing (
https://issues.apache.org/jira/browse/SPARK-6521). In the current Spark
shuffle fetch still leverages Netty even two executors are on the same
node, but according to the test on the JIRA, the performance is close
whether to bypass network or not. From my
Yes, currently if log is aggregated, then accessing through UI is not
worked, you can create a JIRA to improve this if you would like to.
On Thu, Jun 8, 2017 at 1:43 PM, ckhari4u wrote:
> Hey Guys,
>
> I am hitting the below issue when trying to access the STDOUT/STDERR logs
Hi Sean,
Do we have a planned target version for Scala 2.12 support? Several other
projects like Zeppelin, Livy which rely on Spark repl also require changes
to support this Scala 2.12.
Thanks
Jerry
On Thu, Aug 31, 2017 at 5:55 PM, Sean Owen wrote:
> No, this doesn't let
Hi Sean,
Two questions about Scala 2.12 for release artifacts.
Are we planning to ship 2.12 artifacts for Spark 2.3 release? If not, will
we only ship 2.11 artifacts?
Thanks
Jerry
2017-11-28 21:51 GMT+08:00 Sean Owen :
> The Scala 2.12 profile mostly works, but not all
experimental and for
> people willing to make their own build. That's why I wanted it in good
> enough shape that the scala-2.12 profile produces something basically
> functional.
>
> On Tue, Nov 28, 2017 at 8:43 PM Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
>
+1, looking forward to more design details of this feature.
Thanks
Jerry
On Wed, Nov 8, 2017 at 6:40 AM, Shixiong(Ryan) Zhu
wrote:
> +1
>
> On Tue, Nov 7, 2017 at 1:34 PM, Joseph Bradley
> wrote:
>
>> +1
>>
>> On Mon, Nov 6, 2017 at 5:11 PM,
+1, checked new py4j related changes.
Marcelo Vanzin 于2018年5月17日周四 上午5:41写道:
> This is actually in 2.3, jira is just missing the version.
>
> https://github.com/apache/spark/pull/20765
>
> On Wed, May 16, 2018 at 2:14 PM, kant kodali wrote:
> > I am not
t; +1, I heard some Spark users have skipped v2.3.1 because of these bugs.
>>>>
>>>> On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> Wenchen Fan 于2018年6月28日 周四下午2:06写道:
FYI, currently we have one block issue (
https://issues.apache.org/jira/browse/SPARK-24535), will start the release
after this is fixed.
Also please let me know if there're other blocks or fixes want to land to
2.3.2 release.
Thanks
Saisai
Saisai Shao 于2018年7月2日周一 下午1:16写道:
> I will st
83b2bc8e>
>> and SPARK-24677
>> <https://github.com/apache/spark/commit/7be70e29dd92de36dbb30ce39623d588f48e4cac>,
>> if anyone disagrees we could back those out but I think they would be good
>> to include.
>>
>> Tom
>>
>> On Thursday, July 19, 2018, 8:13:23 PM CDT, Saisa
Please vote on releasing the following candidate as Apache Spark version
2.3.2.
The vote is open until August 20 PST and passes if a majority +1 PMC votes
are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 2.3.2
[ ] -1 Do not release this package because ...
To
There's still another one SPARK-25114.
I will wait for several days in case some other blocks jumped.
Thanks
Saisai
Wenchen Fan 于2018年8月15日周三 上午10:19写道:
> SPARK-25051 is resolved, can we start a new RC?
>
> SPARK-16406 is an improvement, generally we should not backport.
>
> On Wed, Aug 15,
Yes, there'll be an RC4, still waiting for the fix of one issue.
Yuval Itzchakov 于2018年8月6日周一 下午6:10写道:
> Are there any plans to create an RC4? There's an important Kafka Source
> leak
> fix I've merged back to the 2.3 branch.
>
>
>
> --
> Sent from:
One issue I can think of is that this "moving the driver log" in the
application end is quite time-consuming, which will significantly delay the
shutdown. We already suffered such "rename" problem for event log on object
store, the moving of driver log will make the problem severe.
For a vanilla
hink my ticket should block this release. It's a big general
>> refactoring.
>>
>> Xiao do you have a ticket for the bug you found?
>>
>>
>> On Thu, Jul 19, 2018 at 5:24 PM Saisai Shao
>> wrote:
>>
>>> Hi Xiao,
>>>
>>> Are you referri
R is to get rid of AnalysisBarrier. That is better than multiple
> patches we added for AnalysisBarrier after 2.3.0 release. We can target it
> to 2.4.
>
> Thanks,
>
> Xiao
>
> 2018-07-19 17:48 GMT-07:00 Saisai Shao :
>
>> I see, thanks Reynold.
>>
>> Rey
Hi Xiao,
Are you referring to this JIRA (
https://issues.apache.org/jira/browse/SPARK-24865)?
Xiao Li 于2018年7月20日周五 上午2:41写道:
> dfWithUDF.cache()
> dfWithUDF.write.saveAsTable("t")
> dfWithUDF.write.saveAsTable("t1")
>
>
> Cached data is not being used. It causes a big performance regression.
Please vote on releasing the following candidate as Apache Spark version
2.3.2.
The vote is open until July 20 PST and passes if a majority +1 PMC votes
are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 2.3.2
[ ] -1 Do not release this package because ...
To
path.)
>
> [error] A full rebuild may help if 'MetricsSystem.class' was compiled
> against an incompatible version of org.eclipse
>
> On Sun, Jul 15, 2018 at 3:09 AM Saisai Shao
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>
t; should not block release.
>
> On Sun, Jul 15, 2018 at 9:39 PM Saisai Shao
> wrote:
>
>> Hi Sean,
>>
>> I just did a clean build with mvn/sbt on 2.3.2, I didn't meet the errors
>> you pasted here. I'm not sure how it happens.
>>
>> S
Hi,
PMC members asked me to hold a bit while they're dealing with some other
things. Please wait for a bit while.
Thanks
Saisai
zzc <441586...@qq.com> 于2018年9月6日周四 下午4:27写道:
> Hi Saisai:
> Spark 2.4 was cut, and is there any new process on 2.3.2?
>
>
>
> --
> Sent from:
>
>>
>> Otherwise nothing is open for 2.3.2, sigs and license look good, tests
>> pass as last time, etc.
>>
>> +1
>>
>> On Sun, Jul 8, 2018 at 3:30 AM Saisai Shao
>> wrote:
>>
>>> Please vote on releasing the following candidate as A
ache/spark/pull/21659, and fix the release doc too.
>
>
> 2018년 7월 9일 (월) 오전 8:25, Saisai Shao 님이 작성:
>
>> Hi Sean,
>>
>> SPARK-24530 is not included in this RC1 release. Actually I'm so familiar
>> with this issue so still using python2 to generate docs.
>&g
Please vote on releasing the following candidate as Apache Spark version
2.3.2.
The vote is open until July 11th PST and passes if a majority +1 PMC votes
are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 2.3.2
[ ] -1 Do not release this package because ...
To
ocs are usable or not in
> this RC. They looked reasonable to me but I don't know enough to know what
> the issue was. If the result is usable, then there's no problem here, even
> if something could be fixed/improved later.
>
> On Sun, Jul 8, 2018 at 7:25 PM Saisai Shao wrote:
>
&g
Congrats, Zhenhua!
2018-04-02 16:57 GMT+08:00 Takeshi Yamamuro :
> Congrats, Zhenhua!
>
> On Mon, Apr 2, 2018 at 4:13 PM, Ted Yu wrote:
>
>> Congratulations, Zhenhua
>>
>> Original message
>> From: 雨中漫步 <601450...@qq.com>
>> Date:
Yes, the main blocking issue is the hive version used in Spark
(1.2.1.spark) doesn't support run on Hadoop 3. Hive will check the Hadoop
version in the runtime [1]. Besides this I think some pom changes should be
enough to support Hadoop 3.
If we want to use Hadoop 3 shaded client jar, then the
Congrats to everyone!
Thanks
Jerry
2018-03-03 15:30 GMT+08:00 Liang-Chi Hsieh :
>
> Congrats to everyone!
>
>
> Kazuaki Ishizaki wrote
> > Congratulations to everyone!
> >
> > Kazuaki Ishizaki
> >
> >
> >
> > From: Takeshi Yamamuro
>
> > linguin.m.s@
>
> >
> > To:
+1, like mentioned by Marcelo, these issues seems quite severe.
I can work on the release if short of hands :).
Thanks
Jerry
Marcelo Vanzin 于2018年6月28日周四 上午11:40写道:
> +1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
> for those out.
>
> (Those are what delayed 2.2.2 and
Congratulations to all!
Jacek Laskowski 于2018年10月7日周日 上午1:12写道:
> Wow! That's a nice bunch of contributors. Congrats to all new committers.
> I've had tough times to follow all the contributions, but with this crew
> it's gonna be nearly impossible.
>
> Pozdrawiam,
> Jacek Laskowski
>
>
Just my two cents of the past experience. As a release manager of Spark
2.3.2, I felt significantly delay during the release by block issues. Vote
was failed several times by one or two "block issue". I think during the RC
time, each "block issue" should be carefully evaluated by the related PMCs
Only "without-hadoop" profile has 2.12 binary, is it expected?
Thanks
Saisai
Wenchen Fan 于2018年9月28日周五 上午11:08写道:
> I'm adding my own +1, since all the problems mentioned in the RC1 voting
> email are all resolved. And there is no blocker issue for 2.4.0 AFAIK.
>
> On Fri, Sep 28, 2018 at
I like this proposal. Since Kafka already provides delegation token
mechanism, we can also leverage Spark's delegation token framework to add
Kafka as a built-in support.
BTW I think there's no much difference in support structured streaming and
DStream, maybe we can set both as goal.
Thanks
We are happy to announce the availability of Spark 2.3.2!
Apache Spark 2.3.2 is a maintenance release, based on the branch-2.3
maintenance branch of Spark. We strongly recommend all 2.3.x users to
upgrade to this stable release.
To download Spark 2.3.2, head over to the download page:
The vote passes. Thanks to all who helped with the release!
I'll follow up later with a release announcement once everything is
published.
+1 (* = binding):
Sea Owen*
Wenchen Fan*
Saisai Shao
Dongjoon Hyun
Takeshi Yamamuro
John Zhuge
Xiao Li*
Denny Lee
Ryan Blue
Michael Heuer
+0: None
-1
Agreed to have a new branch-2.3 release, as we already accumulated several
fixes.
Thanks
Saisai
Xiao Li 于2019年1月2日周三 下午1:32写道:
> Based on the commit history,
> https://gitbox.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.3
> contains more critical fixes. Maybe the priority
ld from source with most profiles passed for me.
>> On Mon, Sep 17, 2018 at 8:17 AM Saisai Shao
>> wrote:
>> >
>> > Please vote on releasing the following candidate as Apache Spark
>> version 2.3.2.
>> >
>> > The vote is open until September 21 PST a
Hi Wenchen,
I think you need to set SPHINXPYTHON to python3 before building the docs,
to workaround the doc issue (
https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression
).
Here is the notes for release page:
** FAILED ***
>>
>> Thank you, Saisai.
>>
>> Bests,
>> Dongjoon.
>>
>> On Mon, Sep 17, 2018 at 6:48 PM Saisai Shao
>> wrote:
>>
>>> +1 from my own side.
>>>
>>> Thanks
>>> Saisai
>>>
>>> W
:
>> false) *** FAILED ***
>>
>> Thank you, Saisai.
>>
>> Bests,
>> Dongjoon.
>>
>> On Mon, Sep 17, 2018 at 6:48 PM Saisai Shao
>> wrote:
>>
>>> +1 from my own side.
>>>
>>> Thanks
>>> Saisai
>>>
Please vote on releasing the following candidate as Apache Spark version
2.3.2.
The vote is open until September 21 PST and passes if a majority +1 PMC
votes are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 2.3.2
[ ] -1 Do not release this package because ...
Do we have other block/critical issues for Spark 2.4.1 or waiting something
to be fixed? I roughly searched the JIRA, seems there's no block/critical
issues marked for 2.4.1.
Thanks
Saisai
shane knapp 于2019年3月7日周四 上午4:57写道:
> i'll be popping in to the sig-big-data meeting on the 20th to talk
Hi DB,
I saw that we already have 6 RCs, but the vote I can search by now was RC2,
were they all canceled?
Thanks
Saisai
DB Tsai 于2019年2月22日周五 上午4:51写道:
> I am cutting a new rc4 with fix from Felix. Thanks.
>
> Sincerely,
>
> DB Tsai
>
+1 (binding)
Thanks
Saisai
Imran Rashid 于2019年6月15日周六 上午3:46写道:
> +1 (binding)
>
> I think this is a really important feature for spark.
>
> First, there is already a lot of interest in alternative shuffle storage
> in the community. There is already a lot of interest in alternative
> shuffle
I'm currently working with MemVerge on the Splash project (one
implementation of remote shuffle storage) and followed this ticket for a
while. I would like to be a shepherd if no one else volunteered to be.
Best regards,
Saisai
Matt Cheah 于2019年6月6日周四 上午8:33写道:
> Hi everyone,
>
>
>
> I wanted
I think maybe we could start a vote on this SPIP.
This has been discussed for a while, and the current doc is pretty complete
as for now. Also we saw lots of demands in the community about building
their own shuffle storage.
Thanks
Saisai
Imran Rashid 于2019年6月11日周二 上午3:27写道:
> I would be
+1
Wenchen Fan 于2019年8月19日周一 上午10:28写道:
> +1
>
> On Sat, Aug 17, 2019 at 3:37 PM Hyukjin Kwon wrote:
>
>> +1 too
>>
>> 2019년 8월 17일 (토) 오후 3:06, Dilip Biswal 님이 작성:
>>
>>> +1
>>>
>>> Regards,
>>> Dilip Biswal
>>> Tel: 408-463-4980
>>> dbis...@us.ibm.com
>>>
>>>
>>>
>>> - Original message
Congratulations!
Jungtaek Lim 于2019年9月9日周一 下午6:11写道:
> Congratulations! Well deserved!
>
> On Tue, Sep 10, 2019 at 9:51 AM John Zhuge wrote:
>
>> Congratulations!
>>
>> On Mon, Sep 9, 2019 at 5:45 PM Shane Knapp wrote:
>>
>>> congrats everyone! :)
>>>
>>> On Mon, Sep 9, 2019 at 5:32 PM Matei
Hi Ben and Felix, I'm also interested in this. Would you please add me to
the invite, thanks a lot.
Best regards,
Saisai
Greg Lee 于2019年12月2日周一 下午11:34写道:
> Hi Felix & Ben,
>
> This is Li Hao from Baidu, same team with Linhong.
>
> As mentioned in Linhong’s email, independent disaggregated
77 matches
Mail list logo