Alex, LZO can be a pain, we've all seen it; I have a few tips I've compiled that might help you (I've posted these before):
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e walk through some of those and see if that does help you. I've hung LZO up before and what has worked for me was to step through each phase, re-doing the step as needed until I've got it working again. Josh Patterson Cloudera On Sun, Aug 15, 2010 at 7:58 AM, Alex Luya <alexander.l...@gmail.com> wrote: > Hi, > > At every beginning,I run:hadoop jar hadoop-*-examples.jar grep input output > 'dfs[a-z.]+' successfully,but when run: > > nutch crawl url -dir crawl -depth 3,got errors: > > ------------------------------------------------------------------------- > ------------------------------------------------------------------------- > > 10/08/07 22:53:30 INFO crawl.Crawl: crawl started in: crawl > > ..................................................................... > > 10/08/07 22:53:30 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > > Exception in thread "main" java.lang.RuntimeException: Error in configuring > object > > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > > at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > > ..................................................................... > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:124) > > Caused by: java.lang.reflect.InvocationTargetException > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > ..................................................................... > > ... 9 more > > Caused by: java.lang.IllegalArgumentException: Compression codec > > org.apache.hadoop.io.compress.GzipCodec not found. > > at > org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96) > > ..................................................................... > > ... 14 more > > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.io.compress.GzipCodec > > ..................................................................... > > at > org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89) > > ... 16 more > > ------------------------------------------------------------------------- > ------------------------------------------------------------------------- > > So,here GzipCode didn't get loaded successfully,or maybe it will not be > loaded by default,I don't know,but I think it should be,then I followed this > > link:http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ to install lzo > and run: > > "nutch crawl url -dir crawl -depth 3" again,got errors: > > ------------------------------------------------------------------------- > ------------------------------------------------------------------------- > > 10/08/07 22:40:41 INFO crawl.Crawl: crawl started in: crawl > > ..................................................................... > > 10/08/07 22:40:42 INFO crawl.Injector: Injector: Converting injected urls to > crawl db entries. > > 10/08/07 22:40:42 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > > Exception in thread "main" java.lang.RuntimeException: Error in configuring > object > > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > > ..................................................................... > > at org.apache.nutch.crawl.Injector.inject(Injector.java:211) > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:124) > > Caused by: java.lang.reflect.InvocationTargetException > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > ..................................................................... > > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > > ... 9 more > > Caused by: java.lang.IllegalArgumentException: Compression codec > > org.apache.hadoop.io.compress.GzipCodec not found. > > ..................................................................... > > at > org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:41) > > ... 14 more > > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.io.compress.GzipCodec > > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > > at java.security.AccessController.doPrivileged(Native Method) > > ..................................................................... > > at > org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89) > > ... 16 more > > ------------------------------------------------------------------------- > ------------------------------------------------------------------------- > > run:hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+',got > errors: > > ------------------------------------------------------------------------- > ------------------------------------------------------------------------- > > java.lang.RuntimeException: Error in configuring object > > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > > at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > > at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:400) > > ..................................................................... > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > Caused by: java.lang.reflect.InvocationTargetException > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > ..................................................................... > > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > > ... 22 more > > Caused by: java.lang.IllegalArgumentException: Compression codec > com.hadoop.compression.lzo.LzoCodec > > not found. > > at > org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96) > > at > org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:134) > > at > org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:41) > > ... 27 more > > Caused by: java.lang.ClassNotFoundException: > com.hadoop.compression.lzo.LzoCodec > > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > > at java.security.AccessController.doPrivileged(Native Method) > > ..................................................................... > > at > org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89) > > ... 29 more > > ------------------------------------------------------------------------- > ------------------------------------------------------------------------- > > when I run " ps -aef|grep gpl",got output: > > ------------------------------------------------------------------------- > ------------------------------------------------------------------------- > > alex 2267 1 1 22:04 pts/1 00:00:04 /usr/local/hadoop/jdk1.6.0_21/bin/java > -Xmx200m -Dcom.sun.management.jmxremote > > -.............................................. > > /usr/local/hadoop/hadoop-0.20.2/bin/../conf:/usr/local/hadoop/jdk1.6.0_21/lib/tools.jar:/usr/local/hadoop/hadoop-0.20.2/bin/..:/usr/local/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2- > > core.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-cli-1.2.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons- > > codec-1.3.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-el-1.0.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-.- > > net-1.4.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/core-3.1.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/hadoop-gpl-compression-0.2.0- > > dev.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/hsqldb-1.8.0.10.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jasper- > > compiler-5.5.12.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jasper- > > runtime-5.5.12.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jets3t-0.6.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jetty-6.1.14.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jetty- > > .......................................- > > log4j12-1.4.3.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/xmlenc-0.52.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jsp-2.1/jsp-2.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jsp-2.1/jsp- > > api-2.1.jar org.apache.hadoop.hdfs.server.namenode.NameNode > > See,two jars(hadoop-core and pgl) are existing in classpath,but seems that > they can't be referenced by job,before this, I have tried to install > hadoop-lzo(http://github.com/kevinweil/hadoop-lzo),same errors,maybe > hadoop-lzo only works for hadoop 0.20,not for 0.20.1/2,I don't know.After > one month,I haven't solved this problem,it's killing me,here I post all > configure files,would you please help me dig problem out?Thank you. > > core-site.xml > > ------------------------------------------------------------------------- > ------------------------------------------------------------------------- > > <configuration> > > <property> > > <name>fs.default.name</name> > > <value>hdfs://AlexLuya:8020</value> > > </property> > > <property> > > <name>hadoop.tmp.dir</name> > > <value>/home/alex/tmp</value> > > </property> > > <property> > > <name>io.compression.codecs</name> > > <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec > > </value> > > </property> > > <property> > > <name>io.compression.codec.lzo.class</name> > > <value>com.hadoop.compression.lzo.LzoCodec</value> > > </property> > > </configuration> > > ------------------------------------------------------------------------- > ------------------------------------------------------------------------- > > mapreduce.xml > > ------------------------------------------------------------------------- > ------------------------------------------------------------------------- > > <configuration> > > <property> > > <name>mapred.job.tracker</name> > > <value>AlexLuya:9001</value> > > </property> > > <property> > > <name>mapred.tasktracker.reduce.tasks.maximum</name> > > <value>1</value> > > </property> > > <property> > > <name>mapred.tasktracker.map.tasks.maximum</name> > > <value>1</value> > > </property> > > <property> > > <name>mapred.local.dir</name> > > <value>/home/alex/hadoop/mapred/local</value> > > </property> > > <property> > > <name>mapred.system.dir</name> > > <value>/tmp/hadoop/mapred/system</value> > > </property> > > <property> > > <name>mapreduce.map.output.compress</name> > > <value>true</value> > > </property> > > <property> > > <name>mapreduce.map.output.compress.codec</name> > > <value>com.hadoop.compression.lzo.LzoCodec</value> > > </property> > > </configuration> > > ------------------------------------------------------------------------- > ------------------------------------------------------------------------- > > hadoop-env.sh > > ------------------------------------------------------------------------- > ------------------------------------------------------------------------- > > # Set Hadoop-specific environment variables here. > > # The only required environment variable is JAVA_HOME. All others are > > # optional. When running a distributed configuration it is best to > > # set JAVA_HOME in this file, so that it is correctly defined on > > # remote nodes. > > # The java implementation to use. Required. > > export JAVA_HOME=/usr/local/hadoop/jdk1.6.0_21 > > # Extra Java CLASSPATH elements. Optional. > > # export HADOOP_CLASSPATH= > > # The maximum amount of heap to use, in MB. Default is 1000. > > export HADOOP_HEAPSIZE=200 > > # Extra Java runtime options. Empty by default. > > #export HADOOP_OPTS=-server > > # Command specific options appended to HADOOP_OPTS when specified > > export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote > $HADOOP_NAMENODE_OPTS" > > export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote > $HADOOP_SECONDARYNAMENODE_OPTS" > > export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote > $HADOOP_DATANODE_OPTS" > > export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote > $HADOOP_BALANCER_OPTS" > > export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote > $HADOOP_JOBTRACKER_OPTS" > > # export HADOOP_TASKTRACKER_OPTS= > > # The following applies to multiple commands (fs, dfs, fsck, distcp etc) > > # export HADOOP_CLIENT_OPTS > > # Extra ssh options. Empty by default. > > # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR" > > # Where log files are stored. $HADOOP_HOME/logs by default. > > # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs > > # File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. > > # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves > > # host:path where hadoop code should be rsync'd from. Unset by default. > > # export HADOOP_MASTER=master:/home/$USER/src/hadoop > > # Seconds to sleep between slave commands. Unset by default. This > > # can be useful in large clusters, where, e.g., slave rsyncs can > > # otherwise arrive faster than the master can service them. > > # export HADOOP_SLAVE_SLEEP=0.1 > > # The directory where pid files are stored. /tmp by default. > > # export HADOOP_PID_DIR=/var/hadoop/pids > > # A string representing this instance of hadoop. $USER by default. > > #export HADOOP_IDENT_STRING=$USER > > # The scheduling priority for daemon processes. See 'man nice'. > > # export HADOOP_NICENESS=10 > > ------------------------------------------------------------------------- > -------------------------------------------------------------------------