[ https://issues.apache.org/jira/browse/NUTCH-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki updated NUTCH-732: ------------------------------------ Attachment: sub.patch Turns out this was due to a way the list of applicable collections is created, and how that field is added to the indexing backend. First, it appends a leading space, creating collection names like ' nutch' instead of 'nutch'. Then, instead of tokenizing this field it passes it as is, so the leading space is kept and prevents you from running a query. I changed the collection name appending logic, and turned the field into tokenized. I'll commit the patch shortly. > Subcollection plugin not working on Nutch-1.0 > --------------------------------------------- > > Key: NUTCH-732 > URL: https://issues.apache.org/jira/browse/NUTCH-732 > Project: Nutch > Issue Type: Bug > Components: indexer > Affects Versions: 1.0.0 > Environment: Mac OS X 10.5 intel > Reporter: Filipe Antunes > Priority: Critical > Attachments: sub.patch > > > I am trying to get subcollections working, using Nutch-1.0 ! > I configured subcolections.xml then I added the plugin on nutch-site.xml. > When the index finishes, I opened lucene luke to check if the database was > working properly. > The field subcollection is populated as it should, but searching for any > subcollection, on the search tab of luke, returns no results. > If I do a search on the url field, I can see that every record has a > subcollection associated, yet i can't search for using the subcollection > field. > search examples on luke: > subcollection:sub1 -> no results > url:sub1 -> results with field subcollection populated -> sub1 > Same results using: > ./bin/nutch org.apache.nutch.searcher.NutchBean "subcollection:sub1 sub" > If i use the "explain", subcollection field is there with the correct word. > It makes no sense so i beleive it's a bug. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.