[ 
https://issues.apache.org/jira/browse/NUTCH-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated NUTCH-732:
------------------------------------

    Attachment: sub.patch

Turns out this was due to a way the list of applicable collections is created, 
and how that field is added to the indexing backend. First, it appends a 
leading space, creating collection names like ' nutch' instead of 'nutch'. 
Then, instead of tokenizing this field it passes it as is, so the leading space 
is kept and prevents you from running a query.

I changed the collection name appending logic, and turned the field into 
tokenized.

I'll commit the patch shortly.

> Subcollection plugin not working on Nutch-1.0
> ---------------------------------------------
>
>                 Key: NUTCH-732
>                 URL: https://issues.apache.org/jira/browse/NUTCH-732
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.0.0
>         Environment: Mac OS X 10.5 intel
>            Reporter: Filipe Antunes
>            Priority: Critical
>         Attachments: sub.patch
>
>
> I am trying to get subcollections working, using Nutch-1.0 !
> I configured subcolections.xml then I added the plugin on nutch-site.xml.
> When the index finishes, I opened lucene luke to check if the database was 
> working properly.
> The field subcollection is populated as it should, but searching for any 
> subcollection, on the search tab of luke, returns no results.
> If I do a search on the url field, I can see that every record has a 
> subcollection associated, yet i can't search for using the  subcollection 
> field.
> search examples on luke:
> subcollection:sub1 -> no results
> url:sub1 -> results with field subcollection populated -> sub1
> Same results using:
> ./bin/nutch org.apache.nutch.searcher.NutchBean "subcollection:sub1 sub"
> If i use the "explain", subcollection field is there with the correct word.
> It makes no sense so i beleive it's a bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to