[ 
https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872386#action_12872386
 ] 

Eks Dev commented on LUCENE-2482:
---------------------------------

Re: I'm not sure if I follow your use case though

Simple case, you have a 100Mio docs with 2 fields, CITY and  TEXT

sorting on CITY makes postings look like: 
    Orlando:                  ---------------------------------
 New York:                                                               
-------------------------------------
perfectly compressible. 

without really affecting distribution (compressibility) of terms from the TEXT 
field.

If CITY would remain in unsorted order (e.g. uniform distribution), you deal 
with very large postings for all terms coming from this field  

Sorting on many fields helps often, e.g. if you have hierarchical compositions 
like 1 CITY with many  ZIP_CODES...  philosophically, sorting always increases 
compressibility and improves locality of reference... but sure, you need to 
know what you want

> Index sorter
> ------------
>
>                 Key: LUCENE-2482
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2482
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 3.1
>            Reporter: Andrzej Bialecki 
>             Fix For: 3.1
>
>         Attachments: indexSorter.patch
>
>
> A tool to sort index according to a float document weight. Documents with 
> high weight are given low document numbers, which means that they will be 
> first evaluated. When using a strategy of "early termination" of queries (see 
> TimeLimitedCollector) such sorting significantly improves the quality of 
> partial results.
> (Originally this tool was created by Doug Cutting in Nutch, and used norms as 
> document weights - thus the ordering was limited by the limited resolution of 
> norms. This is a pure Lucene version of the tool, and it uses arbitrary 
> floats from a specified stored field).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to