Eugene Sapozhnikov created ZEPPELIN-253: -------------------------------------------
Summary: EMR Spark deployment: Class com.hadoop.compression.lzo.LzoCodec not found Key: ZEPPELIN-253 URL: https://issues.apache.org/jira/browse/ZEPPELIN-253 Project: Zeppelin Issue Type: Bug Components: Core Affects Versions: 0.6.0 Environment: It's Amazon EMR cluster: AMI version:3.8.0 Hadoop distribution:Amazon 2.4.0 Applications:Hive 0.13.1, Pig 0.12.0, Spark 1.3.1 Zeppelin current clone from git master: 0.6.0-incubating-SNAPSHOT Contents of zeppelin-env.sh: export MASTER=yarn-client export HADOOP_CONF_DIR=/home/hadoop/conf export ZEPPELIN_SPARK_USEHIVECONTEXT=false export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=2 -Dspark.executor.cores=2 -Dspark.executor.memory=1547M -Dspark.default.parallelism=4" Reporter: Eugene Sapozhnikov Priority: Blocker Hi, I am trying to install EMR+Spark+Zeppelin with no luck. I executed recommendations from https://gist.github.com/andershammar/224e1077021d0ea376dd, everything feels all set, I checked this .sh file line by line. At the host 'spark-shell' is working okay, my test code executes just fine. When I enter Zeppelin and try to actually do something in Scala in a notebook I get error below. Could you tell me what's wrong in connecting Zeppelin to the existing Spark cluster, or head me to some instruction about it. So far the matter of proper configuration is foggy for me. CODE AND OUTPUT: val people = sc.textFile("s3://mybucket/storage-archive/run=2015-08-15*") people.take(10) people: org.apache.spark.rdd.RDD[String] = s3://mybucket/storage-archive/run=2015-08-15* MapPartitionsRDD[3] at textFile at <console>:23 java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:186) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) ... Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128) ... 59 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)