[
https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872386#action_12872386
]
Eks Dev commented on LUCENE-2482:
---------------------------------
Re: I'm not sure if I follow your use case though
Simple case, you have a 100Mio docs with 2 fields, CITY and TEXT
sorting on CITY makes postings look like:
Orlando: ---------------------------------
New York:
-------------------------------------
perfectly compressible.
without really affecting distribution (compressibility) of terms from the TEXT
field.
If CITY would remain in unsorted order (e.g. uniform distribution), you deal
with very large postings for all terms coming from this field
Sorting on many fields helps often, e.g. if you have hierarchical compositions
like 1 CITY with many ZIP_CODES... philosophically, sorting always increases
compressibility and improves locality of reference... but sure, you need to
know what you want
> Index sorter
> ------------
>
> Key: LUCENE-2482
> URL: https://issues.apache.org/jira/browse/LUCENE-2482
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Affects Versions: 3.1
> Reporter: Andrzej Bialecki
> Fix For: 3.1
>
> Attachments: indexSorter.patch
>
>
> A tool to sort index according to a float document weight. Documents with
> high weight are given low document numbers, which means that they will be
> first evaluated. When using a strategy of "early termination" of queries (see
> TimeLimitedCollector) such sorting significantly improves the quality of
> partial results.
> (Originally this tool was created by Doug Cutting in Nutch, and used norms as
> document weights - thus the ordering was limited by the limited resolution of
> norms. This is a pure Lucene version of the tool, and it uses arbitrary
> floats from a specified stored field).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]