On Mon, Jan 27, 2014 at 7:12 AM, Michael McCandless
<[email protected]> wrote:
> Hi Benson,
>
> I use the code from luceneutil
> (https://code.google.com/a/apache-extras.org/p/luceneutil/ ), e.g. I
> run those scripts nightly for the nightly benchmarks:
> http://people.apache.org/~mikemccand/lucenebench
>
> But, that's the Wikipedia corpus, and has no "real" queries, and the
> scripts are quite challenging to get working ... if you have access to
> more "realistic" corpus + queries, even if you can't share it, those
> results are also interesting to share.
>
> I think it would be neat if an app could retroactively pick DirectPF
> at search time, or more generally pass search-time parameters when
> initializing codec components (I think there was a discussion about
> this at some point but I can't remember what the use case was).
> Today, any and all choices must be written into the index and cannot
> be changed at search time, which is somewhat silly/restrictive for
> DirectPF since it can wrap any other PF and act as simply a fast
> "cache" on top of the postings.

Well, that's where I thought I was starting: an API into the reader
that allows DirectPF to be injected as a wrapper around others. I
haven't had time to follow Rob's bread-crumb trail to see if this is
straightforward by customizing Directory -- thought it occurs to me
that we have many directories, and it would useful to be able to do
this regardless.

I may be able to share a data set, I'll check into that today.


>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Jan 27, 2014 at 7:06 AM, Benson Margulies <[email protected]> 
> wrote:
>> What do we have for a benchmark framework that is used to
>> justify/qualify speed-related things? One way forward would be to see
>> what a quantified measurement shows from the idea I have in mind, and
>> use that to facilitate deciding if this belongs in the tree.
>>
>> On Sat, Jan 25, 2014 at 6:34 PM, Benson Margulies <[email protected]> 
>> wrote:
>>> Keeping things in memory and not re-reading them from disk is what
>>> really sang the song for us. Even if the initial read-in was more
>>> costly due to decompression, the long-term amortized benefit of not
>>> re-reading would still be a big winner.
>>>
>>>
>>> On Sat, Jan 25, 2014 at 5:37 PM, Robert Muir <[email protected]> wrote:
>>>> well the Directory layer likely isnt what probably makes DirectPF faster 
>>>> for
>>>> you. Its probably the fact it does no compression at all...
>>>>
>>>>
>>>> On Sat, Jan 25, 2014 at 5:34 PM, Benson Margulies <[email protected]>
>>>> wrote:
>>>>>
>>>>> On Sat, Jan 25, 2014 at 5:09 PM, Robert Muir <[email protected]> wrote:
>>>>> > That would be Directory :)
>>>>>
>>>>> Oh,  how embarrassing. I could have written a custom directory to begin
>>>>> with.
>>>>>
>>>>> Would a Directory class for this purpose be an interesting patch, in
>>>>> that case? I'm not discontented about building a Directory into our
>>>>> application, but it seems like I might not be the only person to find
>>>>> this useful.
>>>>>
>>>>> >
>>>>> >
>>>>> > On Sat, Jan 25, 2014 at 5:03 PM, Benson Margulies
>>>>> > <[email protected]>
>>>>> > wrote:
>>>>> >>
>>>>> >> I've had very gratifying results using the DirectPostingFormat to
>>>>> >> speed up queries when I had a read-only index with plenty of memory.
>>>>> >> The only downside was the need to specify it within the Codec, and
>>>>> >> thus write it into the index.
>>>>> >>
>>>>> >> Ever since, I've wondered if we could change things to introduce the
>>>>> >> same goodness without building it into the codec.
>>>>> >>
>>>>> >> Very roughly, I'm imagining an option in the IndexReader to provide an
>>>>> >> object that can surround the codec that is called for in the stored
>>>>> >> format.
>>>>> >>
>>>>> >> Is this an old question? Is it worth sketching a patch?
>>>>> >>
>>>>> >> ---------------------------------------------------------------------
>>>>> >> To unsubscribe, e-mail: [email protected]
>>>>> >> For additional commands, e-mail: [email protected]
>>>>> >>
>>>>> >
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to