Nathan Gass created NUTCH-1490:
----------------------------------
Summary: Data Truncation exceptions when using mysql
Key: NUTCH-1490
URL: https://issues.apache.org/jira/browse/NUTCH-1490
Project: Nutch
Issue Type: Bug
Affects Versions: 2.1
Reporter: Nathan Gass
Nutch does not ensure the set (or implicit) maximal length for the following
columns:
title
urls (id, baseUrl, reprUrl,
typ (contentType)
inlinks
outlinks
Trying to store too much data in one of this columns results in an exception
similar to this (copied from GORA-24, I will be able to add an newer stack
trace later today):
java.io.IOException: java.sql.BatchUpdateException: Data truncation: Data too
long for column 'inlinks' at row 1
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
Caused by: java.sql.BatchUpdateException: Data truncation: Data too long for
column 'inlinks' at row 1
at
com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2018)
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1449)
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
... 5 more
I'll add my current fixes in later comments.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira