[ 
https://issues.apache.org/jira/browse/NUTCH-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489972#comment-13489972
 ] 

Nathan Gass commented on NUTCH-1473:
------------------------------------

Yes this happens because of large pages. The appropriate type was TEXT for me, 
as I got UTF8 issues after indexing to solr with BLOB type. Our mysql server 
uses character set utf8.

There are other columns where nutch does not ensure the data is small enough 
(or does not reserve enough space in gora-sql-mapping.xml), which is always a 
problem at least when using mysql. Should I mention them here or open separate 
issues?
                
> Column length too big for column 'text' (max = 21845); use BLOB or TEXT 
> instead
> -------------------------------------------------------------------------------
>
>                 Key: NUTCH-1473
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1473
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 2.1
>            Reporter: zhaixuepan
>
> Exception in thread "main" org.apache.gora.util.GoraException: 
> java.io.IOException: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Column length too 
> big for column 'text' (max = 21845); use BLOB or TEXT instead
>       at 
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
>       at 
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
>       at 
> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
>       at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214)
>       at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:62)
>       at org.apache.nutch.crawl.Crawler.run(Crawler.java:133)
>       at org.apache.nutch.crawl.Crawler.run(Crawler.java:246)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at org.apache.nutch.crawl.Crawler.main(Crawler.java:253)
> Caused by: java.io.IOException: 
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Column length too 
> big for column 'text' (max = 21845); use BLOB or TEXT instead
>       at org.apache.gora.sql.store.SqlStore.createSchema(SqlStore.java:226)
>       at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:172)
>       at 
> org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
>       at 
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
>       ... 8 more
> Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Column 
> length too big for column 'text' (max = 21845); use BLOB or TEXT instead
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>       at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
>       at com.mysql.jdbc.Util.getInstance(Util.java:386)
>       at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
>       at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
>       at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
>       at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
>       at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
>       at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2625)
>       at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2119)
>       at 
> com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2415)
>       at 
> com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2333)
>       at 
> com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2318)
>       at org.apache.gora.sql.store.SqlStore.createSchema(SqlStore.java:224)
>       ... 11 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to