[
https://issues.apache.org/jira/browse/NUTCH-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267247#comment-14267247
]
Lewis John McGibbney commented on NUTCH-1140:
---------------------------------------------
Any issues with committing this fix? I've just run into this issue as well and
the most recent patches and comments as suggested by numerous people on this
thread solve the issue without hacking the schema in such a way as to have
multi-valued titles for a document... which is illogical.
> index-more plugin, resetTitle method creates multiple values in the Title
> field
> -------------------------------------------------------------------------------
>
> Key: NUTCH-1140
> URL: https://issues.apache.org/jira/browse/NUTCH-1140
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.3
> Reporter: Joe Liedtke
> Priority: Minor
> Fix For: 1.10
>
> Attachments: 0001-NUTCH-1140-2.x.patch, 0001-NUTCH-1140-trunk.patch,
> MoreIndexingFilter.093011.patch
>
>
> From the comments in MoreIndexingFilter.java, the index-more plugin is meant
> to reset the Title field of a document if it contains a Content-Disposition
> header. The current behavior is to add a Title regardless of whether one
> exists or not, which can cause issues down the line with the Solr Indexing
> process, and based on a thread in the nutch user list it appears that this is
> causing some users to mark the title as multi-valued in the schema:
>
> http://www.lucidimagination.com/search/document/9440ff6b5deb285b/multiple_values_encountered_for_non_multivalued_field_title#17736c5807826be8
> The following patch removes the title field before adding a new one, which
> has resolved the issue for me:
> --- MoreIndexingFilter.old 2011-09-30 11:44:35.000000000 +0000
> +++ MoreIndexingFilter.java 2011-09-30 09:58:48.000000000 +0000
> @@ -276,6 +276,7 @@
> for (int i=0; i<patterns.length; i++) {
> if (matcher.contains(contentDisposition,patterns[i])) {
> result = matcher.getMatch();
> + doc.removeField("title");
> doc.add("title", result.group(1));
> break;
> }
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)