Re: Spark 2.4.4 with Hadoop 3.2.0

2019-11-25 Thread nihed mbarek
Hi,
Spark 2.x is already part of Cloudera CDH6 who is based on Hadoop 3.x so
they support officially Spark2+Hadoop3
So for sure, there is tests and development done from this side. In other
part,  I don't know the status for Hadoop 3.2.

Regards,

On Tue, Nov 26, 2019 at 1:46 AM Alfredo Marquez 
wrote:

> Thank you Ismael! That's what I was looking for. I can take this to our
> platform team.
>
> Alfredo
>
> On Mon, Nov 25, 2019, 3:32 PM Ismaël Mejía  wrote:
>
>> Not officially. Apache Spark only announced support for Hadoop 3.x
>> starting with the upcoming Spark 3.
>> There is a preview release of Spark 3 with support for Hadoop 3.2 that
>> you can try now:
>>
>> https://archive.apache.org/dist/spark/spark-3.0.0-preview/spark-3.0.0-preview-bin-hadoop3.2.tgz
>>
>> Enjoy!
>>
>>
>>
>> On Tue, Nov 19, 2019 at 3:44 PM Alfredo Marquez <
>> alfredo.g.marq...@gmail.com> wrote:
>>
>>> I also would like know the answer to this question.
>>>
>>> Thanks,
>>>
>>> Alfredo
>>>
>>> On Tue, Nov 19, 2019, 8:24 AM bsikander  wrote:
>>>
 Hi,
 Are Spark 2.4.4 and Hadoop 3.2.0 compatible?
 I tried to search the mailing list but couldn't find anything relevant.





 --
 Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org



-- 

M'BAREK Med Nihed,
Fedora Ambassador, TUNISIA, Northern Africa
http://www.nihed.com




Re: Spark 2.4.4 with Hadoop 3.2.0

2019-11-25 Thread Alfredo Marquez
Thank you Ismael! That's what I was looking for. I can take this to our
platform team.

Alfredo

On Mon, Nov 25, 2019, 3:32 PM Ismaël Mejía  wrote:

> Not officially. Apache Spark only announced support for Hadoop 3.x
> starting with the upcoming Spark 3.
> There is a preview release of Spark 3 with support for Hadoop 3.2 that you
> can try now:
>
> https://archive.apache.org/dist/spark/spark-3.0.0-preview/spark-3.0.0-preview-bin-hadoop3.2.tgz
>
> Enjoy!
>
>
>
> On Tue, Nov 19, 2019 at 3:44 PM Alfredo Marquez <
> alfredo.g.marq...@gmail.com> wrote:
>
>> I also would like know the answer to this question.
>>
>> Thanks,
>>
>> Alfredo
>>
>> On Tue, Nov 19, 2019, 8:24 AM bsikander  wrote:
>>
>>> Hi,
>>> Are Spark 2.4.4 and Hadoop 3.2.0 compatible?
>>> I tried to search the mailing list but couldn't find anything relevant.
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>


Re: Spark 2.4.4 with Hadoop 3.2.0

2019-11-25 Thread Ismaël Mejía
Not officially. Apache Spark only announced support for Hadoop 3.x starting
with the upcoming Spark 3.
There is a preview release of Spark 3 with support for Hadoop 3.2 that you
can try now:
https://archive.apache.org/dist/spark/spark-3.0.0-preview/spark-3.0.0-preview-bin-hadoop3.2.tgz

Enjoy!



On Tue, Nov 19, 2019 at 3:44 PM Alfredo Marquez 
wrote:

> I also would like know the answer to this question.
>
> Thanks,
>
> Alfredo
>
> On Tue, Nov 19, 2019, 8:24 AM bsikander  wrote:
>
>> Hi,
>> Are Spark 2.4.4 and Hadoop 3.2.0 compatible?
>> I tried to search the mailing list but couldn't find anything relevant.
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: GraphX performance feedback

2019-11-25 Thread mahzad kalantari
Thanks for your answer, my use case is friend recommandation for 200
million profils.

Le lun. 25 nov. 2019 à 14:10, Jörn Franke  a écrit :

> I think it depends what you want do. Interactive big data graph analytics
> are probably better of in Janusgraph or similar.
> Batch processing (once-off) can be still fine in graphx - you have though
> to carefully design the process.
>
> Am 25.11.2019 um 20:04 schrieb mahzad kalantari <
> mahzad.kalant...@gmail.com>:
>
> 
> Hi all
>
> My question is about GraphX, I 'm looking for user feedbacks on the
> performance.
>
> I read this paper written by Facebook team that says Graphx has very poor
> performance.
>
> https://engineering.fb.com/core-data/a-comparison-of-state-of-the-art-graph-processing-systems/
>
>
> Has anyone already encountered performance problems with Graphx, and is it
> a good choice if I want to do large scale graph modelling?
>
>
> Thanks!
>
> Mahzad
>
>


Re: GraphX performance feedback

2019-11-25 Thread Jörn Franke
I think it depends what you want do. Interactive big data graph analytics are 
probably better of in Janusgraph or similar. 
Batch processing (once-off) can be still fine in graphx - you have though to 
carefully design the process. 

> Am 25.11.2019 um 20:04 schrieb mahzad kalantari :
> 
> 
> Hi all
> 
> My question is about GraphX, I 'm looking for user feedbacks on the 
> performance.
> 
> I read this paper written by Facebook team that says Graphx has very poor 
> performance.
> https://engineering.fb.com/core-data/a-comparison-of-state-of-the-art-graph-processing-systems/
>   
> 
> Has anyone already encountered performance problems with Graphx, and is it a 
> good choice if I want to do large scale graph modelling?
> 
> 
> Thanks!
> 
> Mahzad 


GraphX performance feedback

2019-11-25 Thread mahzad kalantari
Hi all

My question is about GraphX, I 'm looking for user feedbacks on the
performance.

I read this paper written by Facebook team that says Graphx has very poor
performance.
https://engineering.fb.com/core-data/a-comparison-of-state-of-the-art-graph-processing-systems/


Has anyone already encountered performance problems with Graphx, and is it
a good choice if I want to do large scale graph modelling?


Thanks!

Mahzad


Status of Spark testing on ARM64

2019-11-25 Thread Tianhua huang
Hi all,
I will give you some informations about ARM CI of Spark:

Our team and community are working on build/test Spark master on ARM64
server, after find and fix some issues[1], we have integrated two ARM
testing jobs[2] to community CI(AMPLAB Jenkins),
they run as daily job and have been stablely running for few weeks, and the
two ARM testing jobs are success generally.
Thanks Sean Owen, Shane Knapp, Dongjoon Hyun and community to help us :)

If you are interested, please have a try:)  Before
https://github.com/apache/spark/pull/26636 merged, you have to download and
maven install org.openlabtesting.leveldbjni:leveldbjni-all.1.8 using
commands:
wget
https://repo1.maven.org/maven2/org/openlabtesting/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar

mvn install:install-file -DgroupId=org.fusesource.leveldbjni
-DartifactId=leveldbjni-all -Dversion=1.8 -Dpackaging=jar
-Dfile=leveldbjni-all-1.8.jar
Then, you can build and test Spark on ARM64 server.

If you have any questions, please don't hesitate to contact me, thanks all!

[1]:
https://issues.apache.org/jira/browse/SPARK-28770 (
https://github.com/apache/spark/pull/25673)
https://issues.apache.org/jira/browse/SPARK-28519 (
https://github.com/apache/spark/pull/25279)
https://issues.apache.org/jira/browse/SPARK-28433 (
https://github.com/apache/spark/pull/25186)
https://issues.apache.org/jira/browse/SPARK-28467 (
https://github.com/apache/spark/pull/25864)
https://issues.apache.org/jira/browse/SPARK-29286 (
https://github.com/apache/spark/pull/26021)
https://issues.apache.org/jira/browse/SPARK-29286
   (
https://github.com/apache/spark/pull/26636) --- this one is in progress

[2]:
   https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/
 The job spark-master-test-maven-arm same with community x86 job
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/
,
 It runs all java/scala tests, the test number is about 21,112.
   https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/
 The job spark-master-test-python-arm runs pyspark tests with python3.6.


Re: how spark structrued stream write to kudu

2019-11-25 Thread lk_spark
I found _sqlContext is null , how to resolve it ?

2019-11-25 

lk_spark 



发件人:"lk_spark"
发送时间:2019-11-25 16:00
主题:how spark structrued stream write to kudu
收件人:"user.spark"
抄送:

hi,all:
   I'm using spark 2.4.4 to readstream data from kafka and want to write to 
kudu 1.7.0 , my code like below : 

val kuduContext = new KuduContext("master:7051", spark.sparkContext)

val console = cnew.select("*").as[CstoreNew]
  .writeStream
  .option("checkpointLocation", "/tmp/t3/")
  .trigger(Trigger.Once())
  .foreach(new ForeachWriter[CstoreNew] {
override def open(partitionId: Long, version: Long): Boolean = {
  true
}
override def process(value: CstoreNew): Unit = {
  val spark = SparkSessionSingleton.getInstance(sparkConf)
  val valueDF = Seq(value).toDF()   // GET WRONG
  kuduContext.upsertRows(valueDF, "impala::test.cstore_bury_event_data")
}
override def close(errorOrNull: Throwable): Unit = {
}
  })
val query = console.start()
query.awaitTermination()

when run to val valueDF = Seq(value).toDF() I got error msg : 
Caused by: java.lang.NullPointerException
 at 
org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:228)
 at 
com.gaojihealth.spark.kafkaconsumer.CstoreNew2KUDU$$anon$1.process(CstoreNew2KUDU.scala:122)
...

and  SQLImplicits.scala:228 is :

227:   implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): 
DatasetHolder[T] = {
228:DatasetHolder(_sqlContext.createDataset(s))
229:   }

can anyone give me some help?
2019-11-25


lk_spark 

how spark structrued stream write to kudu

2019-11-25 Thread lk_spark
hi,all:
   I'm using spark 2.4.4 to readstream data from kafka and want to write to 
kudu 1.7.0 , my code like below : 

val kuduContext = new KuduContext("master:7051", spark.sparkContext)

val console = cnew.select("*").as[CstoreNew]
  .writeStream
  .option("checkpointLocation", "/tmp/t3/")
  .trigger(Trigger.Once())
  .foreach(new ForeachWriter[CstoreNew] {
override def open(partitionId: Long, version: Long): Boolean = {
  true
}
override def process(value: CstoreNew): Unit = {
  val spark = SparkSessionSingleton.getInstance(sparkConf)
  val valueDF = Seq(value).toDF()   // GET WRONG
  kuduContext.upsertRows(valueDF, "impala::test.cstore_bury_event_data")
}
override def close(errorOrNull: Throwable): Unit = {
}
  })
val query = console.start()
query.awaitTermination()

when run to val valueDF = Seq(value).toDF() I got error msg : 
Caused by: java.lang.NullPointerException
 at 
org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:228)
 at 
com.gaojihealth.spark.kafkaconsumer.CstoreNew2KUDU$$anon$1.process(CstoreNew2KUDU.scala:122)
...

and  SQLImplicits.scala:228 is :

227:   implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): 
DatasetHolder[T] = {
228:DatasetHolder(_sqlContext.createDataset(s))
229:   }

can anyone give me some help?
2019-11-25


lk_spark