Dedup fails due to date format (long)
-------------------------------------
Key: NUTCH-986
URL: https://issues.apache.org/jira/browse/NUTCH-986
Project: Nutch
Issue Type: Bug
Components: indexer
Affects Versions: 1.3
Reporter: Markus Jelsma
Fix For: 1.3
As already mentioned on the list, dedup also failes because of invalid date
formats.
Apr 19, 2011 10:34:50 AM org.apache.solr.request.BinaryResponseWriter$Resolver
getDoc
WARNING: Error reading a field from document :
SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
java.lang.NumberFormatException: For input string: "2011-04-19T08:16:31.675Z"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:419)
at java.lang.Long.valueOf(Long.java:525)
at org.apache.solr.schema.LongField.toObject(LongField.java:82)
....
Strange enough, Solr seems to allow updates of long fields with a formatted
date. In Nutch 1.2 the tstamp field is actually a long but in 1.3 the field is
a valid Solr date format. This exception is only triggered using the javabin
response writer so there's something weird in Solr too.
We need to either change the tstamp field back to a long or update the Solr
example schema and fix SolrDeleteDuplicates to use the formatted date instead
of the long.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira