Eugene Sapozhnikov created ZEPPELIN-253:
-------------------------------------------

             Summary: EMR Spark deployment: Class 
com.hadoop.compression.lzo.LzoCodec not found
                 Key: ZEPPELIN-253
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-253
             Project: Zeppelin
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.6.0
         Environment: It's Amazon EMR cluster:

AMI version:3.8.0
Hadoop distribution:Amazon 2.4.0
Applications:Hive 0.13.1, Pig 0.12.0, Spark 1.3.1
Zeppelin current clone from git master: 0.6.0-incubating-SNAPSHOT

Contents of zeppelin-env.sh:
export MASTER=yarn-client
export HADOOP_CONF_DIR=/home/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=2 
-Dspark.executor.cores=2 -Dspark.executor.memory=1547M 
-Dspark.default.parallelism=4"
            Reporter: Eugene Sapozhnikov
            Priority: Blocker


Hi,
I am trying to install EMR+Spark+Zeppelin with no luck.

I executed recommendations from 
https://gist.github.com/andershammar/224e1077021d0ea376dd, everything feels all 
set, I checked this .sh file line by line.

At the host 'spark-shell' is working okay, my test code executes just fine.
When I enter Zeppelin and try to actually do something in Scala in a notebook I 
get error below.

Could you tell me what's wrong in connecting Zeppelin to the existing Spark 
cluster, or head me to some instruction about it. So far the matter of proper 
configuration is foggy for me.

CODE AND OUTPUT:
val people = sc.textFile("s3://mybucket/storage-archive/run=2015-08-15*")
people.take(10)

people: org.apache.spark.rdd.RDD[String] = 
s3://mybucket/storage-archive/run=2015-08-15* MapPartitionsRDD[3] at textFile 
at <console>:23
java.lang.RuntimeException: Error in configuring object
    at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:186)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
...
Caused by: java.lang.ClassNotFoundException: Class 
com.hadoop.compression.lzo.LzoCodec not found
    at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
    at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
    ... 59 more




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to