Hello:
I got source code from http://github.com/kevinweil/hadoop-lzo,compiled
them successfully,and then
1,copy hadoop-lzo-0.4.4.jar to directory:$HADOOP_HOME/lib of each master and
slave
2,Copy all files under directory:../Linux-amd64-64/lib to directory:
$HADDOOP_HOME/lib/native/Linux-amd64-64 of each master and slave
3,and upload a file:test.lzo to HDFS
4,then run:hadoop jar $HADOOP_HOME/lib/hadoop-lzo-0.4.4.jar
com.hadoop.compression.lzo.DistributedLzoIndexer test.lzo to test
got errors:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10/07/20 22:37:37 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
10/07/20 22:37:37 INFO lzo.LzoCodec: Successfully loaded & initialized native-
lzo library [hadoop-lzo rev 5c25e0073d3dae9ace4bd9eba72e4dc43650c646]
##########^_^^_^^_^^_^^_^^_^##################
(I think this says all native library got loaded successfully)
################################
10/07/20 22:37:37 INFO lzo.DistributedLzoIndexer: Adding LZO file target.lz:o
to indexing list (no index currently exists)
...
attempt_201007202234_0001_m_000000_0, Status : FAILED
java.lang.IllegalArgumentException: Compression codec
com.hadoop.compression.lzo.LzopCodec
not found.
at
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
at
org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:134)
at
com.hadoop.mapreduce.LzoSplitRecordReader.initialize(LzoSplitRecordReader.java:48)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
com.hadoop.compression.lzo.LzopCodec
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
at
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
... 6 more
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
There is a installation instruction in this
link:http://github.com/kevinweil/hadoop-lzo,it says other configurings are
needed :
Once the libs are built and installed, you may want to add them to the class
paths and library paths. That is, in hadoop-env.sh, set
(1)export HADOOP_CLASSPATH=/path/to/your/hadoop-lzo-lib.jar
Question:I have copied hadoop-lzo-0.4.4.jar to $HADOOP_HOME/lib,
,should I do set this entry like this again? actually, after I add this:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/hbase-0.20.4.jar:
$HABSE_HOME/config:$ZOOKEEPER_HOME/zookeeper-3.3.1.jar:$HADOOP_HOME/lib
/hadoop-lzo-0.4.4.jar,redo 1-4 as above,same problem as before,so:
how can I
get hadoop to load hadoop-lzo-0.4.4.jar?)
(2),export JAVA_LIBRARY_PATH=/path/to/hadoop-lzo-native-
libs:/path/to/standard-hadoop-native-libs
Note that there seems to be a bug in /path/to/hadoop/bin/hadoop; comment
out the line
(3)JAVA_LIBRARY_PATH=''
Question:since native library got loaded successfully,aren't these
operation(2)(3) needed?
-----------------------------------------------
I am using hadoop 0.20.2
core-site.xml
-----------------------------------------------------------------------------
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec
</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
</configuration>
-----------------------------------------------------------------------------
mapred-site.xml
-----------------------------------------------------------------------------
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>AlexLuya:9001</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/alex/hadoop/mapred/local</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/tmp/hadoop/mapred/system</value>
</property>
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
</configuration>
-----------------------------------------------------------------------------
hadoop-env.sh
-----------------------------------------------------------------------------
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required.
export JAVA_HOME=/usr/local/hadoop/jdk1.6.0_20
# Extra Java CLASSPATH elements. Optional.
# export HADOOP_CLASSPATH=
# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=200
# Extra Java runtime options. Empty by default.
#export HADOOP_OPTS=-server
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/hbase-0.20.4.jar:
$HABSE_HOME/config:$ZOOKEEPER_HOME/zookeeper-3.3.1.jar:$HADOOP_HOME/lib/hadoop-
lzo-0.4.4.jar
#export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64
# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_JOBTRACKER_OPTS"
# export HADOOP_TASKTRACKER_OPTS=
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
# export HADOOP_CLIENT_OPTS
# Extra ssh options. Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
# Where log files are stored. $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
# File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default.
# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
# host:path where hadoop code should be rsync'd from. Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HADOOP_SLAVE_SLEEP=0.1
# The directory where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/var/hadoop/pids
# A string representing this instance of hadoop. $USER by default.
#export HADOOP_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HADOOP_NICENESS=10