Hello: I got source code from http://github.com/kevinweil/hadoop-lzo,compiled them successfully,and then 1,copy hadoop-lzo-0.4.4.jar to directory:$HADOOP_HOME/lib of each master and slave 2,Copy all files under directory:../Linux-amd64-64/lib to directory: $HADDOOP_HOME/lib/native/Linux-amd64-64 of each master and slave 3,and upload a file:test.lzo to HDFS 4,then run:hadoop jar $HADOOP_HOME/lib/hadoop-lzo-0.4.4.jar com.hadoop.compression.lzo.DistributedLzoIndexer test.lzo to test
got errors: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 10/07/20 22:37:37 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 10/07/20 22:37:37 INFO lzo.LzoCodec: Successfully loaded & initialized native- lzo library [hadoop-lzo rev 5c25e0073d3dae9ace4bd9eba72e4dc43650c646] ##########^_^^_^^_^^_^^_^^_^################## (I think this says all native library got loaded successfully) ################################ 10/07/20 22:37:37 INFO lzo.DistributedLzoIndexer: Adding LZO file target.lz:o to indexing list (no index currently exists) ... attempt_201007202234_0001_m_000000_0, Status : FAILED java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzopCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96) at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:134) at com.hadoop.mapreduce.LzoSplitRecordReader.initialize(LzoSplitRecordReader.java:48) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzopCodec at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89) ... 6 more ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ There is a installation instruction in this link:http://github.com/kevinweil/hadoop-lzo,it says other configurings are needed : Once the libs are built and installed, you may want to add them to the class paths and library paths. That is, in hadoop-env.sh, set (1)export HADOOP_CLASSPATH=/path/to/your/hadoop-lzo-lib.jar Question:I have copied hadoop-lzo-0.4.4.jar to $HADOOP_HOME/lib, ,should I do set this entry like this again? actually, after I add this: export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/hbase-0.20.4.jar: $HABSE_HOME/config:$ZOOKEEPER_HOME/zookeeper-3.3.1.jar:$HADOOP_HOME/lib /hadoop-lzo-0.4.4.jar,redo 1-4 as above,same problem as before,so: how can I get hadoop to load hadoop-lzo-0.4.4.jar?) (2),export JAVA_LIBRARY_PATH=/path/to/hadoop-lzo-native- libs:/path/to/standard-hadoop-native-libs Note that there seems to be a bug in /path/to/hadoop/bin/hadoop; comment out the line (3)JAVA_LIBRARY_PATH='' Question:since native library got loaded successfully,aren't these operation(2)(3) needed? ----------------------------------------------- I am using hadoop 0.20.2 core-site.xml ----------------------------------------------------------------------------- <configuration> <property> <name>fs.default.name</name> <value>hdfs://hadoop:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> </property> <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec </value> </property> <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> </configuration> ----------------------------------------------------------------------------- mapred-site.xml ----------------------------------------------------------------------------- <configuration> <property> <name>mapred.job.tracker</name> <value>AlexLuya:9001</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>1</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>1</value> </property> <property> <name>mapred.local.dir</name> <value>/home/alex/hadoop/mapred/local</value> </property> <property> <name>mapred.system.dir</name> <value>/tmp/hadoop/mapred/system</value> </property> <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> </configuration> ----------------------------------------------------------------------------- hadoop-env.sh ----------------------------------------------------------------------------- # Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. export JAVA_HOME=/usr/local/hadoop/jdk1.6.0_20 # Extra Java CLASSPATH elements. Optional. # export HADOOP_CLASSPATH= # The maximum amount of heap to use, in MB. Default is 1000. export HADOOP_HEAPSIZE=200 # Extra Java runtime options. Empty by default. #export HADOOP_OPTS=-server export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/hbase-0.20.4.jar: $HABSE_HOME/config:$ZOOKEEPER_HOME/zookeeper-3.3.1.jar:$HADOOP_HOME/lib/hadoop- lzo-0.4.4.jar #export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64 # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS" export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS" export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS" # export HADOOP_TASKTRACKER_OPTS= # The following applies to multiple commands (fs, dfs, fsck, distcp etc) # export HADOOP_CLIENT_OPTS # Extra ssh options. Empty by default. # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR" # Where log files are stored. $HADOOP_HOME/logs by default. # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs # File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves # host:path where hadoop code should be rsync'd from. Unset by default. # export HADOOP_MASTER=master:/home/$USER/src/hadoop # Seconds to sleep between slave commands. Unset by default. This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HADOOP_SLAVE_SLEEP=0.1 # The directory where pid files are stored. /tmp by default. # export HADOOP_PID_DIR=/var/hadoop/pids # A string representing this instance of hadoop. $USER by default. #export HADOOP_IDENT_STRING=$USER # The scheduling priority for daemon processes. See 'man nice'. # export HADOOP_NICENESS=10