[ 
https://issues.apache.org/jira/browse/LUCENE-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated LUCENE-252:
---------------------------------

    Attachment: FieldCacheImpl_Tokenized_fields_lucene_2.0.patch

In the project Nutch, we have encountered a subtle bug, which I tracked down 
and found to be related to unintuitive caching in tokenized fields. 

nutch uses several index servers, and the search results from these servers are 
merged using a dedup field for for deleting dupilcates. The values from this 
field is cached by FieldCachImpl. The default is the site field, which is 
indexed and tokenized. However for a Tokenized Field (for example "url" in 
nutch), FieldCacheImpl returns an array of Terms rather that array of field 
values, so dedup'ing becomes faulty. 

Current FieldCache implementation does not respect tokenized fields, and as 
described above caches only terms. I have ported the previous patch and 
improved it for the 2.0 branch. And i will write a patch for the trunk. 

I am voting for this patch to be committed. 

> [PATCH] Problem with Sort logic on tokenized fields
> ---------------------------------------------------
>
>                 Key: LUCENE-252
>                 URL: https://issues.apache.org/jira/browse/LUCENE-252
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.4
>         Environment: Operating System: other
> Platform: All
>            Reporter: Aviran Mordo
>         Assigned To: Lucene Developers
>         Attachments: dif.txt, FieldCacheImpl_Tokenized_fields_lucene_2.0.patch
>
>
> When you set s SortField to a Text field which gets tokenized
> FieldCacheImpl uses the term to do the sort, but then sorting is off 
> especially with more then one word in the field. I think it is much 
> more logical to sort by field's string value if the sort field is Tokenized 
> and
> stored. This way you'll get the CORRECT sort order

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to