Hi Benson, I use the code from luceneutil (https://code.google.com/a/apache-extras.org/p/luceneutil/ ), e.g. I run those scripts nightly for the nightly benchmarks: http://people.apache.org/~mikemccand/lucenebench
But, that's the Wikipedia corpus, and has no "real" queries, and the scripts are quite challenging to get working ... if you have access to more "realistic" corpus + queries, even if you can't share it, those results are also interesting to share. I think it would be neat if an app could retroactively pick DirectPF at search time, or more generally pass search-time parameters when initializing codec components (I think there was a discussion about this at some point but I can't remember what the use case was). Today, any and all choices must be written into the index and cannot be changed at search time, which is somewhat silly/restrictive for DirectPF since it can wrap any other PF and act as simply a fast "cache" on top of the postings. Mike McCandless http://blog.mikemccandless.com On Mon, Jan 27, 2014 at 7:06 AM, Benson Margulies <[email protected]> wrote: > What do we have for a benchmark framework that is used to > justify/qualify speed-related things? One way forward would be to see > what a quantified measurement shows from the idea I have in mind, and > use that to facilitate deciding if this belongs in the tree. > > On Sat, Jan 25, 2014 at 6:34 PM, Benson Margulies <[email protected]> > wrote: >> Keeping things in memory and not re-reading them from disk is what >> really sang the song for us. Even if the initial read-in was more >> costly due to decompression, the long-term amortized benefit of not >> re-reading would still be a big winner. >> >> >> On Sat, Jan 25, 2014 at 5:37 PM, Robert Muir <[email protected]> wrote: >>> well the Directory layer likely isnt what probably makes DirectPF faster for >>> you. Its probably the fact it does no compression at all... >>> >>> >>> On Sat, Jan 25, 2014 at 5:34 PM, Benson Margulies <[email protected]> >>> wrote: >>>> >>>> On Sat, Jan 25, 2014 at 5:09 PM, Robert Muir <[email protected]> wrote: >>>> > That would be Directory :) >>>> >>>> Oh, how embarrassing. I could have written a custom directory to begin >>>> with. >>>> >>>> Would a Directory class for this purpose be an interesting patch, in >>>> that case? I'm not discontented about building a Directory into our >>>> application, but it seems like I might not be the only person to find >>>> this useful. >>>> >>>> > >>>> > >>>> > On Sat, Jan 25, 2014 at 5:03 PM, Benson Margulies >>>> > <[email protected]> >>>> > wrote: >>>> >> >>>> >> I've had very gratifying results using the DirectPostingFormat to >>>> >> speed up queries when I had a read-only index with plenty of memory. >>>> >> The only downside was the need to specify it within the Codec, and >>>> >> thus write it into the index. >>>> >> >>>> >> Ever since, I've wondered if we could change things to introduce the >>>> >> same goodness without building it into the codec. >>>> >> >>>> >> Very roughly, I'm imagining an option in the IndexReader to provide an >>>> >> object that can surround the codec that is called for in the stored >>>> >> format. >>>> >> >>>> >> Is this an old question? Is it worth sketching a patch? >>>> >> >>>> >> --------------------------------------------------------------------- >>>> >> To unsubscribe, e-mail: [email protected] >>>> >> For additional commands, e-mail: [email protected] >>>> >> >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
