[ 
https://issues.apache.org/jira/browse/NUTCH-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kristof  updated NUTCH-1406:
----------------------------

    Attachment: index-metadata_formatted.patch

Formatting done (correct?), spelling error corrected. In regards to the format. 
You are right that Solr uses this date format yyyy-mm-ddThh:mm:ss.mmmZ. The 
used SimpleDateFormat yyyy-MM-dd correctly converts to the 
yyyy-mm-ddThh:mm:ss.mmmZ, but for dates only. I did not consider time when 
using it as the fields I am looking only have date. The conversion basically 
adds time information by interpreting the missing time as 00:00:00 and 
converting it to UTC based on the time zone settings of the machine used in the 
process. I just tested with some altered files into which I included time 
information and several SimpleDateFormat patterns trying to find one which 
works. So far I did not find any that works. A pattern going beyond the pattern 
yyyy-MM-dd the original field values only having are not converted. So it seems 
this solutions is only limited to dates.
                
> metadata-index plugin: conversion to Solr date format
> -----------------------------------------------------
>
>                 Key: NUTCH-1406
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1406
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer, parser
>            Reporter: Kristof 
>            Priority: Minor
>              Labels: conversion, date
>         Attachments: index-metadata_formatted.patch
>
>
> This improvement to the index-metatags plugin (sometimes also refered to 
> parse-metatags plugin) allows for conversion of selected fields to the Solr 
> date format. The main benefit of this conversion is the possibility to create 
> range facets.
> In order to convert the values of selected metatags to Solr date format, you 
> must specify in nutch-site.xml. This can be for example used with Dublin Core 
> elements. A subdomain which would have pages with the meta tag 
> dcterms.modified would be cic.gc.ca. dcterms.modified must also be defined in 
> the metatags.names and index.parse.md properties.
>  
> {code}
> <property>
>       <name>index.dateconvert.md</name>
>       <value>metatag.dcterms.modified</value>
>       <description>For plugin index-metadata: Indicate here the name of the 
> html meta tag that should be converted to Solr date format.
>       </description>
> </property>
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to