Hi,
I was trying to run Benchmark in trunk using MySQL, on a standalone
Hadoop cluster. My conf/gora.properties has this:
gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
gora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch?user=nutch&password=nutch
Jobs were failing though, with the following:
Exception in thread "main" java.lang.NoSuchMethodError:
org.hsqldb.DatabaseURL.parseURL(Ljava/lang/String;ZZ)Lorg/hsqldb/persist/HsqlProperties;
at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source)
at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:207)
at org.gora.sql.store.SqlStore.getConnection(SqlStore.java:712)
at org.gora.sql.store.SqlStore.initialize(SqlStore.java:145)
at
org.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:64)
at
org.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:86)
at
org.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:98)
at
org.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:70)
at
org.apache.nutch.storage.StorageUtils.createDataStore(StorageUtils.java:25)
at
org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:68)
at
org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:50)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:237)
at org.apache.nutch.tools.Benchmark.benchmark(Benchmark.java:190)
at org.apache.nutch.tools.Benchmark.run(Benchmark.java:139)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.tools.Benchmark.main(Benchmark.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Isn't this puzzling... It turns out that java.sql.DriverManager will try
_all_ drivers in turn to see which one can handle the jdbcUrl, and the
usual magic of Class.forName(jdbcDriver) doesn't mean we are going to
use jdbcDriver, it's just to make sure the driver class was loaded and
registered itself on the list of available drivers.
Now, I know why the particular error occured - Hadoop includes HSQLDB
1.8, and we use HSQLDB 2.0. When DriverManager tries each driver in
turn, unfortunately Hsqldb is first on the classpath (it comes in
Hadoop/lib), and MySQL is the last, so it bombs out even before trying
the right driver...
For now I changed my build.xml to this:
Index: build.xml
===================================================================
--- build.xml (revision 983564)
+++ build.xml (working copy)
@@ -123,7 +123,7 @@
excludes="nutch-default.xml,nutch-site.xml"/>
<zipfileset dir="${conf.dir}" excludes="*.template,hadoop*.*"/>
<zipfileset dir="${build.lib.dir}" prefix="lib"
- includes="**/*.jar" excludes="hadoop-*.jar"/>
+ includes="**/*.jar" excludes="hadoop-*.jar,hsqldb*.jar"/>
<zipfileset dir="${build.plugins}" prefix="plugins"/>
</jar>
</target>
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com