data locality in spark

2015-04-27 Thread Grandl Robert
Hi guys,
I am running some SQL queries, but all my tasks are reported as either 
NODE_LOCAL or PROCESS_LOCAL. 
In case of Hadoop world, the reduce tasks are RACK or NON_RACK LOCAL because 
they have to aggregate data from multiple hosts. However, in Spark even the 
aggregation stages are reported as NODE/PROCESS LOCAL.
Do I miss something, or why the reduce-like tasks are still NODE/PROCESS LOCAL ?
Thanks,Robert



Re: counters in spark

2015-04-13 Thread Grandl Robert
Guys,
Do you have any thoughts on this ?

Thanks,Robert 


 On Sunday, April 12, 2015 5:35 PM, Grandl Robert 
rgra...@yahoo.com.INVALID wrote:
   

 Hi guys,
I was trying to figure out some counters in Spark, related to the amount of CPU 
or Memory used (in some metric), used by a task/stage/job, but I could not find 
any. 
Is there any such counter available ?
Thank you,Robert





  

counters in spark

2015-04-12 Thread Grandl Robert
Hi guys,
I was trying to figure out some counters in Spark, related to the amount of CPU 
or Memory used (in some metric), used by a task/stage/job, but I could not find 
any. 
Is there any such counter available ?
Thank you,Robert





question regarding the dependency DAG in Spark

2015-03-16 Thread Grandl Robert
Hi guys,

I am trying to get a better understanding of the DAG generation for a job in 
Spark. 

Ideally, what I want is to run some SQL query and extract the generated DAG by 
Spark. By DAG I mean the stages and dependencies among stages, and the number 
of tasks in every stage.

Could you guys point me to the code where is that happening ?

Thank you,
Robert



run spark standalone mode

2015-03-12 Thread Grandl Robert
Hi guys,
I have a stupid question, but I am not sure how to get out of it. 
I deployed spark 1.2.1 on a cluster of 30 nodes. Looking at master:8088 I can 
see all the workers I have created so far. (I start the cluster with 
sbin/start-all.sh)
However, when running a Spark SQL query or even spark-shell, I cannot see any 
job executing at master webUI, but the jobs are able to finish. I suspect they 
are executing locally on the master, but I don't understand why/how and why not 
on slave machines. 

My conf/spark-env.sh is as following:export SPARK_MASTER_IP=ms0220
export 
SPARK_CLASSPATH=$SPARK_CLASSPATH:/users/rgrandl/software/spark-1.2.1-bin-hadoop2.4/lib/snappy-java-1.0.4.1.jar

export SPARK_LOCAL_DIRS=/users/rgrandl/software/data/spark/local

export SPARK_WORKER_MEMORY=52000M
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_CORES=2

export SPARK_WORKER_DIR=/users/rgrandl/software/data/spark/worker
export SPARK_DAEMON_MEMORY=5200M
#export SPARK_DAEMON_JAVA_OPTS=4800M


While conf/slaves is populated with the list of machines used for workers. I 
have to mention that spark-env.sh and slaves files are deployed on all 
machines. 

Thank you,Robert



Re: run spark standalone mode

2015-03-12 Thread Grandl Robert
Sorry guys for this. 

It seems that I need to start the thrift server with --master 
spark://ms0220:7077 option and now I can see applications running in my web UI.
Thanks,Robert
 

 On Thursday, March 12, 2015 10:57 AM, Grandl Robert 
rgra...@yahoo.com.INVALID wrote:
   

 I figured out for spark-shell by passing the --master option. However, still 
troubleshooting for launching sql queries. My current command is like that:
./bin/beeline -u jdbc:hive2://ms0220:1 -n `whoami` -p ignored -f 
tpch_query10.sql
 

 On Thursday, March 12, 2015 10:37 AM, Grandl Robert 
rgra...@yahoo.com.INVALID wrote:
   

 Hi guys,
I have a stupid question, but I am not sure how to get out of it. 
I deployed spark 1.2.1 on a cluster of 30 nodes. Looking at master:8088 I can 
see all the workers I have created so far. (I start the cluster with 
sbin/start-all.sh)
However, when running a Spark SQL query or even spark-shell, I cannot see any 
job executing at master webUI, but the jobs are able to finish. I suspect they 
are executing locally on the master, but I don't understand why/how and why not 
on slave machines. 

My conf/spark-env.sh is as following:export SPARK_MASTER_IP=ms0220
export 
SPARK_CLASSPATH=$SPARK_CLASSPATH:/users/rgrandl/software/spark-1.2.1-bin-hadoop2.4/lib/snappy-java-1.0.4.1.jar

export SPARK_LOCAL_DIRS=/users/rgrandl/software/data/spark/local

export SPARK_WORKER_MEMORY=52000M
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_CORES=2

export SPARK_WORKER_DIR=/users/rgrandl/software/data/spark/worker
export SPARK_DAEMON_MEMORY=5200M
#export SPARK_DAEMON_JAVA_OPTS=4800M


While conf/slaves is populated with the list of machines used for workers. I 
have to mention that spark-env.sh and slaves files are deployed on all 
machines. 

Thank you,Robert





   

Re: run spark standalone mode

2015-03-12 Thread Grandl Robert
I figured out for spark-shell by passing the --master option. However, still 
troubleshooting for launching sql queries. My current command is like that:
./bin/beeline -u jdbc:hive2://ms0220:1 -n `whoami` -p ignored -f 
tpch_query10.sql
 

 On Thursday, March 12, 2015 10:37 AM, Grandl Robert 
rgra...@yahoo.com.INVALID wrote:
   

 Hi guys,
I have a stupid question, but I am not sure how to get out of it. 
I deployed spark 1.2.1 on a cluster of 30 nodes. Looking at master:8088 I can 
see all the workers I have created so far. (I start the cluster with 
sbin/start-all.sh)
However, when running a Spark SQL query or even spark-shell, I cannot see any 
job executing at master webUI, but the jobs are able to finish. I suspect they 
are executing locally on the master, but I don't understand why/how and why not 
on slave machines. 

My conf/spark-env.sh is as following:export SPARK_MASTER_IP=ms0220
export 
SPARK_CLASSPATH=$SPARK_CLASSPATH:/users/rgrandl/software/spark-1.2.1-bin-hadoop2.4/lib/snappy-java-1.0.4.1.jar

export SPARK_LOCAL_DIRS=/users/rgrandl/software/data/spark/local

export SPARK_WORKER_MEMORY=52000M
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_CORES=2

export SPARK_WORKER_DIR=/users/rgrandl/software/data/spark/worker
export SPARK_DAEMON_MEMORY=5200M
#export SPARK_DAEMON_JAVA_OPTS=4800M


While conf/slaves is populated with the list of machines used for workers. I 
have to mention that spark-env.sh and slaves files are deployed on all 
machines. 

Thank you,Robert



   

Spark SQL using Hive metastore

2015-03-11 Thread Grandl Robert
Hi guys,
I am a newbie in running Spark SQL / Spark. My goal is to run some TPC-H 
queries atop Spark SQL using Hive metastore. 
It looks like spark 1.2.1 release has Spark SQL / Hive support. However, I am 
not able to fully connect all the dots. 

I did the following: 
1. Copied hive-site.xml from hive to spark/conf2. Copied mysql connector to 
spark/lib3. I have started hive metastore service: hive --service metastore
3. I have started ./bin/spark-sql 
4. I typed: spark-sql show tables; However, the following error was thrown:  
Job 0 failed: collect at SparkPlan.scala:84, took 0.241788 s
15/03/11 15:02:35 ERROR SparkSQLDriver: Failed in [show tables]
org.apache.spark.SparkException: Job aborted due to stage failure: Task 
serialization failed: org.xerial.snappy.SnappyError: 
[FAILED_TO_LOAD_NATIVE_LIBRARY] no native library is found for os.name=Linux 
and os.arch=aarch64

Do  you know what I am doing wrong ? I mention that I have hive-0.14 instead of 
hive-0.13. 

And another question: What is the right command to run sql queries with spark 
sql using hive metastore ?
Thanks,Robert



shark queries failed

2015-02-15 Thread Grandl Robert
Hi guys,
I deployed BlinkDB(built atop Shark) and running Spark 0.9. 
I tried to run several TPCDS shark queries taken from 
https://github.com/cloudera/impala-tpcds-kit/tree/master/queries-sql92-modified/queries/shark.
 However, the following exceptions are encountered. Do you have any idea why 
that might happen ? 

Thanks,Robert

2015-02-14 17:58:29,358 WARN  util.NativeCodeLoader 
(NativeCodeLoader.java:clinit(52)) - Unable to load native-
hadoop library for your platform... using builtin-java classes where applicable
2015-02-14 17:58:29,360 WARN  snappy.LoadSnappy (LoadSnappy.java:clinit(46)) 
- Snappy native library not loaded
2015-02-14 17:58:34,963 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 6 (task 5.0:2)
2015-02-14 17:58:34,970 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Loss was due to java.lang
.ClassCastException
java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast 
to org.apache.hadoop.io.FloatWrita
ble
    at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableFloatObjectInspector.get(WritableFloat
ObjectInspector.java:35)
    at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:331)
    at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257)
    at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204)
    at 
shark.execution.ReduceSinkOperator$$anonfun$processPartitionNoDistinct$1.apply(ReduceSinkOperator.scal
a:188)
    at 
shark.execution.ReduceSinkOperator$$anonfun$processPartitionNoDistinct$1.apply(ReduceSinkOperator.scal
a:153)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
    at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
    at org.apache.spark.scheduler.Task.run(Task.scala:53)
    at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
    at 
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)
2015-02-14 17:58:34,983 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 8 (task 5.0:4)
2015-02-14 17:58:35,075 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 12 (task 5.0:8)
2015-02-14 17:58:35,119 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 15 (task 5.0:2)
2015-02-14 17:58:35,134 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 9 (task 5.0:5)
2015-02-14 17:58:35,187 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 16 (task 5.0:4)
2015-02-14 17:58:35,203 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 11 (task 5.0:7)
2015-02-14 17:58:35,214 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 13 (task 5.0:9)
2015-02-14 17:58:35,265 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 4 (task 5.0:0)
2015-02-14 17:58:35,274 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 18 (task 5.0:2)
2015-02-14 17:58:35,304 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 17 (task 5.0:8)
2015-02-14 17:58:35,330 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 5 (task 5.0:1)
2015-02-14 17:58:35,354 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 20 (task 5.0:4)
2015-02-14 17:58:35,387 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 19 (task 5.0:5)
2015-02-14 17:58:35,430 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 7 (task 5.0:3)
2015-02-14 17:58:35,432 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 24 (task 5.0:2)
2015-02-14 17:58:35,433 ERROR scheduler.TaskSetManager 
(Logging.scala:logError(65)) - Task 5.0:2 failed 4 times; 
aborting job
2015-02-14 17:58:35,438 ERROR ql.Driver (SessionState.java:printError(400)) - 
FAILED: Execution Error, return cod
e -101 from shark.execution.SparkTask
2015-02-14 17:58:35,552 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 30 (task 6.0:0)
2015-02-14 17:58:35,565 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Loss was due to java.io.F
ileNotFoundException
java.io.FileNotFoundException: http://10.200.146.12:46812/broadcast_4
    at 

Re: shark queries failed

2015-02-15 Thread Grandl Robert
Thanks for reply, Akhil. I cannot update the spark version and run SparkSQL due 
to some old dependencies and a specific project I want to run. 

I was wondering if you have any clue, why that exception might be triggered, or 
if you saw it before. 

Thanks,Robert
 

 On Sunday, February 15, 2015 9:18 AM, Akhil Das 
ak...@sigmoidanalytics.com wrote:
   

 I'd suggest you updating your spark to the latest version and try SparkSQL 
instead of Shark.
ThanksBest Regards
On Sun, Feb 15, 2015 at 7:36 AM, Grandl Robert rgra...@yahoo.com.invalid 
wrote:

Hi guys,
I deployed BlinkDB(built atop Shark) and running Spark 0.9. 
I tried to run several TPCDS shark queries taken from 
https://github.com/cloudera/impala-tpcds-kit/tree/master/queries-sql92-modified/queries/shark.
 However, the following exceptions are encountered. Do you have any idea why 
that might happen ? 

Thanks,Robert

2015-02-14 17:58:29,358 WARN  util.NativeCodeLoader 
(NativeCodeLoader.java:clinit(52)) - Unable to load native-
hadoop library for your platform... using builtin-java classes where applicable
2015-02-14 17:58:29,360 WARN  snappy.LoadSnappy (LoadSnappy.java:clinit(46)) 
- Snappy native library not loaded
2015-02-14 17:58:34,963 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 6 (task 5.0:2)
2015-02-14 17:58:34,970 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Loss was due to java.lang
.ClassCastException
java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast 
to org.apache.hadoop.io.FloatWrita
ble
    at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableFloatObjectInspector.get(WritableFloat
ObjectInspector.java:35)
    at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:331)
    at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257)
    at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204)
    at 
shark.execution.ReduceSinkOperator$$anonfun$processPartitionNoDistinct$1.apply(ReduceSinkOperator.scal
a:188)
    at 
shark.execution.ReduceSinkOperator$$anonfun$processPartitionNoDistinct$1.apply(ReduceSinkOperator.scal
a:153)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
    at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
    at org.apache.spark.scheduler.Task.run(Task.scala:53)
    at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
    at 
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)
2015-02-14 17:58:34,983 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 8 (task 5.0:4)
2015-02-14 17:58:35,075 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 12 (task 5.0:8)
2015-02-14 17:58:35,119 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 15 (task 5.0:2)
2015-02-14 17:58:35,134 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 9 (task 5.0:5)
2015-02-14 17:58:35,187 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 16 (task 5.0:4)
2015-02-14 17:58:35,203 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 11 (task 5.0:7)
2015-02-14 17:58:35,214 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 13 (task 5.0:9)
2015-02-14 17:58:35,265 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 4 (task 5.0:0)
2015-02-14 17:58:35,274 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 18 (task 5.0:2)
2015-02-14 17:58:35,304 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 17 (task 5.0:8)
2015-02-14 17:58:35,330 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 5 (task 5.0:1)
2015-02-14 17:58:35,354 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 20 (task 5.0:4)
2015-02-14 17:58:35,387 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 19 (task 5.0:5)
2015-02-14 17:58:35,430 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 7 (task 5.0:3)
2015-02-14 17:58:35,432 WARN  scheduler.TaskSetManager 
(Logging.scala:logWarning(61)) - Lost TID 24 (task 5.0:2)
2015-02-14 17:58:35,433 ERROR scheduler.TaskSetManager

Spark standalone and HDFS 2.6

2015-02-13 Thread Grandl Robert
Hi guys,
Probably a dummy question. Do you know how to compile Spark 0.9 to easily 
integrate with HDFS 2.6.0 ? 

I was trying 
sbt/sbt -Pyarn -Phadoop-2.6 assembly  
ormvn -Dhadoop.version=2.6.0 -DskipTests clean package
but none of these approaches succeeded. 

Thanks,Robert


Re: Spark standalone and HDFS 2.6

2015-02-13 Thread Grandl Robert
I am trying to run BlinkDB(https://github.com/sameeragarwal/blinkdb) which 
seems to work only with Spark 0.9. However, if I want to access HDFS I need to 
compile Spark against Hadoop version which is running on my cluster(2.6.0). 
Hence, the versions problem ...



 On Friday, February 13, 2015 11:28 AM, Sean Owen so...@cloudera.com 
wrote:
   

 Oh right, you said Spark 0.9. Those profiles won't exist back then. I
don't even know if Hadoop 2.6 will work with 0.9 as-is. The profiles
were introduced later to fix up some compatibility. Why not use 1.2.1?

On Fri, Feb 13, 2015 at 7:26 PM, Grandl Robert rgra...@yahoo.com wrote:
 Thanks Sean for your prompt response.

 I was trying to compile as following:
 mvn -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests clean package

 but I got a bunch of errors(see below). Hadoop-2.6.0 compiled correctly, and
 all hadoop jars are in .m2 repository.

 Do you have any idea what might happens ?

 Robert

 [WARNING] Class com.google.protobuf.Parser not found - continuing with a
 stub.
 [ERROR] error while loading RpcResponseHeaderProto, class file
 '/home/rgrandl/.m2/repository/org/apache/hadoop/hadoop-common/2.6.0/hadoop-common-2.6.0.jar(org/apache/hadoop/ipc/protobuf/RpcHeaderProtos$RpcResponseHeaderProto.class)'
 is broken
 (class java.lang.NullPointerException/null)
 [WARNING] one warning found
 [ERROR] one error found
 [INFO]
 
 [INFO] Reactor Summary:
 [INFO]
 [INFO] Spark Project Parent POM .. SUCCESS [2.537s]
 [INFO] Spark Project Core  FAILURE [25.917s]
 [INFO] Spark Project Bagel ... SKIPPED
 [INFO] Spark Project GraphX .. SKIPPED
 [INFO] Spark Project ML Library .. SKIPPED
 [INFO] Spark Project Streaming ... SKIPPED
 [INFO] Spark Project Tools ... SKIPPED
 [INFO] Spark Project REPL  SKIPPED
 [INFO] Spark Project Assembly  SKIPPED
 [INFO] Spark Project External Twitter  SKIPPED
 [INFO] Spark Project External Kafka .. SKIPPED
 [INFO] Spark Project External Flume .. SKIPPED
 [INFO] Spark Project External ZeroMQ . SKIPPED
 [INFO] Spark Project External MQTT ... SKIPPED
 [INFO] Spark Project Examples  SKIPPED
 [INFO]
 
 [INFO] BUILD FAILURE
 [INFO]
 
 [INFO] Total time: 30.002s
 [INFO] Finished at: Fri Feb 13 11:21:36 PST 2015
 [INFO] Final Memory: 49M/1226M
 [INFO]
 
 [WARNING] The requested profile hadoop-2.4 could not be activated because
 it does not exist.
 [ERROR] Failed to execute goal
 net.alchim31.maven:scala-maven-plugin:3.1.5:compile (scala-compile-first) on
 project spark-core_2.10: Execution scala-compile-first of goal
 net.alchim31.maven:scala-maven-plugin:3.1.5:compile failed. CompileFailed -
 [Help 1]



 On Friday, February 13, 2015 11:16 AM, Sean Owen so...@cloudera.com wrote:


 If you just need standalone mode, you don't need -Pyarn. There is no
 -Phadoop-2.6; you should use -Phadoop-2.4 for 2.4+. Yes, set
 -Dhadoop.version=2.6.0. That should be it.

 If that still doesn't work, define doesn't succeed.

 On Fri, Feb 13, 2015 at 7:13 PM, Grandl Robert
 rgra...@yahoo.com.invalid wrote:
 Hi guys,

 Probably a dummy question. Do you know how to compile Spark 0.9 to easily
 integrate with HDFS 2.6.0 ?

 I was trying
 sbt/sbt -Pyarn -Phadoop-2.6 assembly
 or
 mvn -Dhadoop.version=2.6.0 -DskipTests clean package

 but none of these approaches succeeded.

 Thanks,
 Robert




 
 

 On Friday, February 13, 2015 11:28 AM, Sean Owen so...@cloudera.com 
wrote:
   

 Oh right, you said Spark 0.9. Those profiles won't exist back then. I
don't even know if Hadoop 2.6 will work with 0.9 as-is. The profiles
were introduced later to fix up some compatibility. Why not use 1.2.1?

On Fri, Feb 13, 2015 at 7:26 PM, Grandl Robert rgra...@yahoo.com wrote:
 Thanks Sean for your prompt response.

 I was trying to compile as following:
 mvn -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests clean package

 but I got a bunch of errors(see below). Hadoop-2.6.0 compiled correctly, and
 all hadoop jars are in .m2 repository.

 Do you have any idea what might happens ?

 Robert

 [WARNING] Class com.google.protobuf.Parser not found - continuing with a
 stub.
 [ERROR] error while loading RpcResponseHeaderProto, class file
 '/home/rgrandl/.m2/repository/org/apache/hadoop/hadoop-common/2.6.0/hadoop-common-2.6.0.jar(org/apache/hadoop/ipc/protobuf/RpcHeaderProtos$RpcResponseHeaderProto.class)'
 is broken
 (class

Re: Spark standalone and HDFS 2.6

2015-02-13 Thread Grandl Robert
Thanks Sean for your prompt response. 

I was trying to compile as following:
mvn -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests clean package
but I got a bunch of errors(see below). Hadoop-2.6.0 compiled correctly, and 
all hadoop jars are in .m2 repository.
Do you have any idea what might happens ?
Robert

[WARNING] Class com.google.protobuf.Parser not found - continuing with a stub.
[ERROR] error while loading RpcResponseHeaderProto, class file 
'/home/rgrandl/.m2/repository/org/apache/hadoop/hadoop-common/2.6.0/hadoop-common-2.6.0.jar(org/apache/hadoop/ipc/protobuf/RpcHeaderProtos$RpcResponseHeaderProto.class)'
 is broken
(class java.lang.NullPointerException/null)
[WARNING] one warning found
[ERROR] one error found
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM .. SUCCESS [2.537s]
[INFO] Spark Project Core  FAILURE [25.917s]
[INFO] Spark Project Bagel ... SKIPPED
[INFO] Spark Project GraphX .. SKIPPED
[INFO] Spark Project ML Library .. SKIPPED
[INFO] Spark Project Streaming ... SKIPPED
[INFO] Spark Project Tools ... SKIPPED
[INFO] Spark Project REPL  SKIPPED
[INFO] Spark Project Assembly  SKIPPED
[INFO] Spark Project External Twitter  SKIPPED
[INFO] Spark Project External Kafka .. SKIPPED
[INFO] Spark Project External Flume .. SKIPPED
[INFO] Spark Project External ZeroMQ . SKIPPED
[INFO] Spark Project External MQTT ... SKIPPED
[INFO] Spark Project Examples  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 30.002s
[INFO] Finished at: Fri Feb 13 11:21:36 PST 2015
[INFO] Final Memory: 49M/1226M
[INFO] 
[WARNING] The requested profile hadoop-2.4 could not be activated because it 
does not exist.
[ERROR] Failed to execute goal 
net.alchim31.maven:scala-maven-plugin:3.1.5:compile (scala-compile-first) on 
project spark-core_2.10: Execution scala-compile-first of goal 
net.alchim31.maven:scala-maven-plugin:3.1.5:compile failed. CompileFailed - 
[Help 1]

 

 On Friday, February 13, 2015 11:16 AM, Sean Owen so...@cloudera.com 
wrote:
   

 If you just need standalone mode, you don't need -Pyarn. There is no
-Phadoop-2.6; you should use -Phadoop-2.4 for 2.4+. Yes, set
-Dhadoop.version=2.6.0. That should be it.

If that still doesn't work, define doesn't succeed.

On Fri, Feb 13, 2015 at 7:13 PM, Grandl Robert
rgra...@yahoo.com.invalid wrote:
 Hi guys,

 Probably a dummy question. Do you know how to compile Spark 0.9 to easily
 integrate with HDFS 2.6.0 ?

 I was trying
 sbt/sbt -Pyarn -Phadoop-2.6 assembly
 or
 mvn -Dhadoop.version=2.6.0 -DskipTests clean package

 but none of these approaches succeeded.

 Thanks,
 Robert