Hi Jikai,

It looks like you're trying to run a Spark job on data that's stored in
HDFS in .lzo format.  Spark can handle this (I do it all the time), but you
need to configure your Spark installation to know about the .lzo format.

There are two parts to the hadoop lzo library -- the first is the jar
(hadoop-lzo.jar) and the second is the native library
(libgplcompression.{a,so,la} and liblzo2.{a,so,la}).  You need the jar on
the classpath across your cluster, but also the native libraries exposed as
well.

In Spark 1.0.1 I modify entries in spark-env.sh: set SPARK_LIBRARY_PATH to
include the path to the native library directory
(e.g. /path/to/hadoop/lib/native/Linux-amd64-64) and SPARK_CLASSPATH to
include the hadoop-lzo jar.

Hope that helps,
Andrew


On Thu, Aug 7, 2014 at 7:19 PM, Xiangrui Meng <men...@gmail.com> wrote:

> Is the GPL library only available on the driver node? If that is the
> case, you need to add them to `--jars` option of spark-submit.
> -Xiangrui
>
> On Thu, Aug 7, 2014 at 6:59 PM, Jikai Lei <hangel...@gmail.com> wrote:
> > I had the following error when trying to run a very simple spark job
> (which
> > uses logistic regression with SGD in mllib):
> >
> > ERROR GPLNativeCodeLoader: Could not load native gpl library
> > java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
> >     at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1738)
> >     at java.lang.Runtime.loadLibrary0(Runtime.java:823)
> >     at java.lang.System.loadLibrary(System.java:1028)
> >     at
> >
> com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
> >     at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
> >     at java.lang.Class.forName0(Native Method)
> >     at java.lang.Class.forName(Class.java:247)
> >     at
> >
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1659)
> >     at
> >
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1624)
> >     at
> >
> org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
> >     at
> >
> org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175)
> >     at
> >
> org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
> >     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >     at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >     at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >     at java.lang.reflect.Method.invoke(Method.java:597)
> >     at
> >
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
> >     at
> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
> >     at
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
> >     at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:155)
> >     at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:187)
> >     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:181)
> >     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93)
> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >     at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >     at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >     at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >     at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >     at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >     at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
> >     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >     at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
> >     at org.apache.spark.scheduler.Task.run(Task.scala:51)
> >     at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >     at java.lang.Thread.run(Thread.java:662)
> > 14/08/06 20:32:11 ERROR LzoCodec: Cannot load native-lzo without
> > native-hadoop
> >
> >
> > This is the command I used to submit the job:
> >
> > ~/spark/spark-1.0.0-bin-hadoop2/bin/spark-submit \
> > --class com.jk.sparktest.Test \
> > --master yarn-cluster \
> > --num-executors 40 \
> > ~/sparktest-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> >
> >
> > The actual java command is :
> >
> > /usr/java/latest/bin/java -cp
> >
> /apache/hadoop/share/hadoop/common/hadoop-common-2.2.0.2.0.6.0-61.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar::/home/jilei/spark/spark-1.0.0-bin-hadoop2/conf:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/jilei/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/apache/hadoop/conf:/apache/hadoop/conf
> > \
> > -XX:MaxPermSize=128m \
> > -Djava.library.path=
> > -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit \
> > --class com.jk.sparktest.Test  \
> > --master yarn-cluster  \
> > --num-executors 40   \
> > ~/sparktest-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> >
> >
> > Seems the -Djava.library.path is not set. I also tried the java command
> > above and supplied the native lib directory to the java.library.path, but
> > still got the same errors.
> >
> > Any idea on what's wrong? Thanks.
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Could-not-load-native-gpl-library-tp11743.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to