[
https://issues.apache.org/jira/browse/NUTCH-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kristof updated NUTCH-1406:
----------------------------
Attachment: index-metadata_formatted.patch
Formatting done (correct?), spelling error corrected. In regards to the format.
You are right that Solr uses this date format yyyy-mm-ddThh:mm:ss.mmmZ. The
used SimpleDateFormat yyyy-MM-dd correctly converts to the
yyyy-mm-ddThh:mm:ss.mmmZ, but for dates only. I did not consider time when
using it as the fields I am looking only have date. The conversion basically
adds time information by interpreting the missing time as 00:00:00 and
converting it to UTC based on the time zone settings of the machine used in the
process. I just tested with some altered files into which I included time
information and several SimpleDateFormat patterns trying to find one which
works. So far I did not find any that works. A pattern going beyond the pattern
yyyy-MM-dd the original field values only having are not converted. So it seems
this solutions is only limited to dates.
> metadata-index plugin: conversion to Solr date format
> -----------------------------------------------------
>
> Key: NUTCH-1406
> URL: https://issues.apache.org/jira/browse/NUTCH-1406
> Project: Nutch
> Issue Type: Improvement
> Components: indexer, parser
> Reporter: Kristof
> Priority: Minor
> Labels: conversion, date
> Attachments: index-metadata_formatted.patch
>
>
> This improvement to the index-metatags plugin (sometimes also refered to
> parse-metatags plugin) allows for conversion of selected fields to the Solr
> date format. The main benefit of this conversion is the possibility to create
> range facets.
> In order to convert the values of selected metatags to Solr date format, you
> must specify in nutch-site.xml. This can be for example used with Dublin Core
> elements. A subdomain which would have pages with the meta tag
> dcterms.modified would be cic.gc.ca. dcterms.modified must also be defined in
> the metatags.names and index.parse.md properties.
>
> {code}
> <property>
> <name>index.dateconvert.md</name>
> <value>metatag.dcterms.modified</value>
> <description>For plugin index-metadata: Indicate here the name of the
> html meta tag that should be converted to Solr date format.
> </description>
> </property>
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira