Hi Faruk, You can either set a lower value for the parameter http.content.limit or modify the mapping and set
<field name="content" column="content" jdbc-type="MEDIUMBLOB"/> which should work for mysql. See the discussion on http://github.com/enis/gora/issues/closed#issue/48 HTH Julien -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com On 7 September 2010 14:02, Andrzej Bialecki <a...@getopt.org> wrote: > On 2010-09-07 14:50, Faruk Berksöz wrote: > >> Dear all, >> >> wenn i try to fetch a web page (e.g. >> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html ) with mysql >> storage definition, >> I am seeing the following error in my hadoop logs. , (no error with >> hbase ) ; >> >> java.io.IOException: java.sql.BatchUpdateException: Data truncation: >> Data too long for column 'content' at row 1 >> at org.gora.sql.store.SqlStore.flush(SqlStore.java:316) >> at org.gora.sql.store.SqlStore.close(SqlStore.java:163) >> at >> org.gora.mapreduce.GoraOutputFormat$1.close(GoraOutputFormat.java:72) >> at >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567) >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) >> >> The type of the column 'content' is BLOB. >> It may be important for the next developments of Gora. >> Should I file this in nutch-jira or hithub/gora or nothing? >> >> environments : ubuntu 10.04 >> JVM : 1.6.0_20 >> nutch 2.0 (trunk) >> Mysql/HBase (0.20.6) / Hadoop(0.20.2) pseudo-distributed >> > > Yes, please create a JIRA issue. Thanks! > > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > >