[
https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495269#comment-13495269
]
Nathan Gass commented on NUTCH-1497:
------------------------------------
Some comments about the differences to NUTCH-1490:
The renaming of column typ was because this column is oddly named and should
imho be done, but is not actually specific to mysql. The length increasing of
the column on the other hand is necessary as I got truncation exceptions with
the typ column set to length 32. Of course if this should not happen I can try
to find out which Url was responsible for this Exception to get at the root
cause.
Setting outlinks to the same length as inlinks makes it unnecessary large (at
least as soon as the maximum outlink number actually gets enforced in nutch).
With the patch in NUTCH-1490 gora uses the column type mediumblob whereas with
this file it would use longblob. I've no idea if this difference is significant.
Increasing the maximum length of urls and titles only makes the truncation
errors occur less frequent. A real fix is to enforce the given maximum length
with appropriate checks in nutch code.
> Better default gora-sql-mapping.xml with larger field sizes for MySQL
> ---------------------------------------------------------------------
>
> Key: NUTCH-1497
> URL: https://issues.apache.org/jira/browse/NUTCH-1497
> Project: Nutch
> Issue Type: Improvement
> Components: storage
> Affects Versions: 2.2
> Environment: MySQL Backend
> Reporter: James Sullivan
> Priority: Minor
> Labels: MySQL
> Attachments: gora-mysql-mapping.xml
>
>
> The current generic default gora-sql-mapping.xml has field sizes that are too
> small in almost all situations when used with MySQL. I have included a
> mapping which will work better for MySQL (takes slightly more space but will
> be able to handle larger fields necessary for real world use). Includes patch
> from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it
> is not possible to use the same gora-sql-mapping for both hsqldb and MySQL
> without a significantly degraded lowest common denominator resulting. Should
> the user manually rename the attached file to gora-sql-mapping.xml or is
> there a way to have Nutch automatically use it when MySQL is selected in
> other configurations (Ivy.xml or gora.properties)?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira