Re: Document-Ids and Merges

Shai Erera Wed, 28 Mar 2012 07:06:52 -0700

Hi

If you are working with trunk, then I believe that ValUes is what you're
looking for. They allow you to store values at the document level, and then
read then during search either from disk or RAM. They are also segment
based.


I'm not sure how ValueSource is used (I've never used it myself and I'm not
near the code to check), but what I had in mind is something similar to
Collector.setNextReader which allows Collectors to use, e.g. a float[] for
that IndexReader (hint, look at FieldCache).

If ValueSource or ValueSourceQuery can do that, then you could use that
mechanism. If not, you can move to do the scoring at the Collector level.

Sorry for the shallow responses - I'm answering from my mobile and won't be
near the code potentially until next week. Perhaps someone else on the list
can give you some concrete examples. If not, plz continue to ask questions
and I'll do my best to answer ;).

Shai
On Mar 28, 2012 9:34 AM, "Christoph Kaser" <christoph.ka...@iconparc.de>
wrote:

> Hi Shai,
>
> That sounds interesting. However, I am unsure how I can do this. Is there
> a way to store values "with a segment"? How can I get the segment from a
> document ID?
> Here is how my ValueSource looks like at the moment:
>
> public class MyScoreValues extends ValueSource {
>    float[] values=...; //float array with reader.maxDoc() entries
>
>    public DocValues getValues(IndexReader reader) throws IOException {
>        return new DocValues() {
>            public float floatVal(int doc) {
>                if(doc < values.length)
>                    return values[doc];
>                return 1.0f;
>            }
>        };
>    }
> }
>
> How would I need to change it to make the arrays segment-based?
>
> Best regards,
> Christoph
>
>
>
> Am 27.03.2012 21:16, schrieb Shai Erera:
>
>> Or ... move to use a per-segment array. Then you don't need to rely on doc
>> IDs changing. You will need to build the array from the documents that are
>> in that segment only.
>>
>> It's like FieldCache in a way. The array is relevant as long as the
>> segment
>> exists (i.e. not merged away).
>>
>> Hope this helps.
>>
>> Shai
>> On Mar 27, 2012 9:29 AM, "Christoph Kaser"<lucene_l...@iconparc.de**>
>>  wrote:
>>
>>  Hi all,
>>>
>>> I have a search application with 16 million documents that uses custom
>>> scores per document using a ValueSource. These values are updated a lot
>>> (and sometimes all at once), so I can't really write them into the index
>>> for performance reasons. Instead, I simply have a huge array of float
>>> values in memory and use the document ID as index in the array.
>>> This works great as long as the index is not changed, but as soon as I
>>> have a few new documents and deletions, index segments are merged (I
>>> suppose) and the document IDs of existing documents change. Is there any
>>> way to be informed when document IDs of existing documents change? If so,
>>> is there a way to calculate the new document ID from the old one, so I
>>> can
>>> "convert" my array to the new document IDs?
>>>
>>> Any help would be greatly appreciated!
>>>
>>> Best regards,
>>> Christoph
>>>
>>> ------------------------------****----------------------------**
>>> --**---------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.****apache.org<
>>> java-user-**unsubscr...@lucene.apache.org<java-user-unsubscr...@lucene.apache.org>
>>> >
>>> For additional commands, e-mail: java-user-help@lucene.apache.****org<
>>> java-user-help@lucene.**apache.org <java-user-h...@lucene.apache.org>>
>>>
>>>
>>>
>
> --
> Dipl.-Inf. Christoph Kaser
>
> IconParc GmbH
> Sophienstrasse 1
> 80333 München
>
> www.iconparc.de
>
> Tel +49 -89- 15 90 06 - 21
> Fax +49 -89- 15 90 06 - 49
>
> Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. HRB
> 121830, Amtsgericht München
>
>
>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: 
> java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org>
> For additional commands, e-mail: 
> java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org>
>
>

Re: Document-Ids and Merges

Reply via email to