Re: Enabling fully disaggregated shuffle on Spark

2019-11-21 Thread Peter Rudenko
TCP. We're open to integrate UCX to other big data components (Apache Arrow / Flight, HDFS, etc), that could be reused in Spark to make the whole spark workloads more effective. Would be glad to see your use cases on optimizing spark shuffle. Regards, Peter Rudenko чт, 21 лист. 2019 о 08:12

Re: SPARk-25299: Updates As Of December 19, 2018

2019-01-03 Thread Peter Rudenko
ider adding it to this new API. Let me know if you need help in review / testing / benchmark. I'll look more on documents and PR, Thanks, Peter Rudenko Software engineer at Mellanox Technologies. ср, 19 груд. 2018 о 20:54 John Zhuge пише: > Matt, appreciate the update! > > On Wed, Dec

Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Rudenko
/ShortCircuitLocalReads.html) to use unix socket for local communication or just directly read a part from other's jvm shuffle file. But yes, it's not available in spark out of box. Thanks, Peter Rudenko пт, 19 жовт. 2018 о 16:54 Peter Liu пише: > Hi Peter, > > thank you for the reply and detailed informati

Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Rudenko
to either non-present pages or mapping changes. So if you have an RDMA capable NIC (or you can try on Azure cloud https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/ ), have a try. For network intensive apps you should get better performance. Thanks, Peter

Re: [SQL] codegen on wide dataset throws StackOverflow

2015-06-26 Thread Peter Rudenko
I'm using spark-1.4.0. Sure will try to make steps to reproduce and file a JIRA ticket. Thanks, Peter Rudenko On 2015-06-26 11:14, Josh Rosen wrote: Which Spark version are you using? Can you file a JIRA for this issue? On Thu, Jun 25, 2015 at 6:35 AM, Peter Rudenko petro.rude...@gmail.com

[SQL] codegen on wide dataset throws StackOverflow

2015-06-25 Thread Peter Rudenko
. Thanks, Peter Rudenko

[Tungsten] NPE in UnsafeShuffleWriter.java

2015-06-19 Thread Peter Rudenko
) Any suggestions? Thanks, Peter Rudenko

[ml] Why all model classes are final?

2015-06-08 Thread Peter Rudenko
customize and combine for my need. Thanks, Peter Rudenko - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Peter Rudenko
) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:456) at org.apache.spark.sql.SQLContext$implicits$.intRddToDataFrameHolder(SQLContext.scala:345) | Thanks, Peter Rudenko On 2015-06-01 05:04, Guoqiang Li wrote: +1 (non-binding) -- Original -- *From: * Sandy Ryza;sandy.r...@cloudera.com

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Peter Rudenko
:= false, Thanks, Peter Rudenko On 2015-06-01 21:10, Yin Huai wrote: Hi Peter, Based on your error message, seems you were not using the RC3. For the error thrown at HiveContext's line 206, we have changed the message to this one https://github.com/apache/spark/blob/v1.4.0-rc3/sql/hive/src/main

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-29 Thread Peter Rudenko
) Thanks, Peter Rudenko On 2015-05-29 07:08, Yin Huai wrote: Justin, If you are creating multiple HiveContexts in tests, you need to assign a temporary metastore location for every HiveContext (like what we do at here https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-28 Thread Peter Rudenko
. | Also is there build for hadoop2.6? Don’t see it here: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/ http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc2-bin/ Thanks, Peter Rudenko On 2015-05-22 22:56, Justin Uang wrote: I'm working on one

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-07 Thread Peter Rudenko
that jar to classpath on hadoop-2.6. Thanks, Peter Rudenko On 2015-05-07 19:41, Nicholas Chammas wrote: I can try that, but the issue is I understand this is supposed to work out of the box (like it does with all the other Spark/Hadoop pre-built packages). On Thu, May 7, 2015 at 12:35 PM Peter

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

2015-05-07 Thread Peter Rudenko
Try to download this jar: http://search.maven.org/remotecontent?filepath=org/apache/hadoop/hadoop-aws/2.6.0/hadoop-aws-2.6.0.jar And add: export CLASSPATH=$CLASSPATH:hadoop-aws-2.6.0.jar And try to relaunch. Thanks, Peter Rudenko On 2015-05-07 19:30, Nicholas Chammas wrote: Hmm, I just

Re: [sql] Dataframe how to check null values

2015-04-20 Thread Peter Rudenko
Sounds very good. Is there a jira for this? Would be cool to have in 1.4, because currently cannot use dataframe.describe function with NaN values, need to filter manually all the columns. Thanks, Peter Rudenko On 2015-04-02 21:18, Reynold Xin wrote: Incidentally, we were discussing

[sql] How to uniquely identify Dataframe?

2015-03-30 Thread Peter Rudenko
SchemaRDD.id. Thanks, Peter Rudenko ​

Re: [MLlib] Performance problem in GeneralizedLinearAlgorithm

2015-02-17 Thread Peter Rudenko
It's fixed today: https://github.com/apache/spark/pull/4593 Thanks, Peter Rudenko On 2015-02-17 18:25, Evan R. Sparks wrote: Josh - thanks for the detailed write up - this seems a little funny to me. I agree that with the current code path there is extra work being done than needs to be (e.g

[ml] Lost persistence for fold in crossvalidation.

2015-02-11 Thread Peter Rudenko
being read and cached. Thanks, Peter Rudenko ​