Hi Steve, Lucene on OpenCL sounds neat!
In Lucene's nightly indexing benchmarks ( http://home.apache.org/~mikemccand/lucenebench/indexing.html) I index an export of Wikipedia's english content, including terms, docIDs, term frequencies, positions, and also points, doc values, stored fields. The full (messy!) source code is in this repository: https://github.com/mikemccand/luceneutil. Both initial indexing and merging are CPU/IO intensive, but they are very amenable to soaking up the hardware's concurrency. On whether there's a market, that's beyond my pay grade ;) I just work on the bits! Different users care about different things. Mike McCandless http://blog.mikemccandless.com On Fri, Jun 17, 2016 at 6:52 PM, Steve Casselman <[email protected]> wrote: > Hi Mike. I’m writing code for the Altera OpenCL SDK. I have a code base > that gives me a non-Lucene format index. I was wondering in your benchmark > what kind of data do you collect? Do you collect all the position and > frequency data? I’m also curious about what you see as the biggest > bottleneck in creating an index? Is it creating the index from the data or > merging the indexes? Or something else? Do you feel the algorithm is CPU, > memory or disk bound? And finally do you think there is a market for > accelerated indexing? Say I could quadruple the price performance yet still > make 100% Lucene compatible indexes, would people pay for that? > > > > > > Thanks > > > > Steve > > >
