Nathan Gass created NUTCH-1490:
----------------------------------

             Summary: Data Truncation exceptions when using mysql
                 Key: NUTCH-1490
                 URL: https://issues.apache.org/jira/browse/NUTCH-1490
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 2.1
            Reporter: Nathan Gass


Nutch does not ensure the set (or implicit) maximal length for the following 
columns:

title
urls (id, baseUrl, reprUrl,
typ (contentType)
inlinks
outlinks

Trying to store too much data in one of this columns results in an exception 
similar to this (copied from GORA-24, I will be able to add an newer stack 
trace later today):

java.io.IOException: java.sql.BatchUpdateException: Data truncation: Data too 
long for column 'inlinks' at row 1 
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340) 
at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185) 
at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55) 
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567) 
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) 
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) 
Caused by: java.sql.BatchUpdateException: Data truncation: Data too long for 
column 'inlinks' at row 1 
at 
com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2018)
 
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1449) 
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328) 
... 5 more

I'll add my current fixes in later comments.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to