Re: Accelerated Lucene Indexing

Michael McCandless Sat, 18 Jun 2016 02:25:15 -0700

Hi Steve,

Lucene on OpenCL sounds neat!

In Lucene's nightly indexing benchmarks (
http://home.apache.org/~mikemccand/lucenebench/indexing.html) I index an
export of Wikipedia's english content, including terms, docIDs, term
frequencies, positions, and also points, doc values, stored fields.  The
full (messy!) source code is in this repository:
https://github.com/mikemccand/luceneutil.

Both initial indexing and merging are CPU/IO intensive, but they are very
amenable to soaking up the hardware's concurrency.

On whether there's a market, that's beyond my pay grade ;)  I just work on
the bits! Different users care about different things.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jun 17, 2016 at 6:52 PM, Steve Casselman <[email protected]> wrote:

> Hi Mike. I’m writing code for the Altera OpenCL SDK. I have a code base
> that gives me a non-Lucene format index. I was wondering in your benchmark
> what kind of data do you collect? Do you collect all the position and
> frequency data? I’m also curious about what you see as the biggest
> bottleneck in creating an index? Is it creating the index from the data or
> merging the indexes?  Or something else? Do you feel the algorithm is CPU,
> memory or disk bound? And finally do you think there is a market for
> accelerated indexing? Say I could quadruple the price performance yet still
> make 100% Lucene compatible indexes, would people pay for that?
>
>
>
>
>
> Thanks
>
>
>
> Steve
>
>
>

Re: Accelerated Lucene Indexing

Reply via email to