[
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945632#comment-15945632
]
Ishan Chattopadhyaya edited comment on LUCENE-7745 at 3/28/17 5:58 PM:
-----------------------------------------------------------------------
Hi Vikash,
Regarding licensing issue:
The work done in this project would be exploratory. That code won't necessarily
go into Lucene. When we are at a point where we see clear benefits from the
work done here, we would then have to explore all aspects of productionizing it
(including licensing).
Regarding next steps:
{quote}
BooleanScorer calls a lot of classes, e.g. the BM25 similarity or TF-IDF to do
the calculation that could possibly be parallelized.
{quote}
# First, understand how BooleanScorer calls these similarity classes and does
the scoring. There are unit tests in Lucene that can help you get there. This
might help: https://wiki.apache.org/lucene-java/HowToContribute
# Write a standalone CUDA/OpenCL project that does the same processing on the
GPU.
# Benchmark the speed of doing so on GPU vs. speed observed when doing the same
through the BooleanScorer. Preferably, on a large resultset. Include time for
copying results and scores in and out of the device memory from/to the main
memory.
# Optimize step 2, if possible.
Once this is achieved (which in itself could be a sufficient GSoC project), one
can have stretch goals to try out other parts of Lucene to optimize (e.g.
spatial search).
Another stretch goal, if the results for optimizations are positive, could be
to integrate the solution into Lucene. Most suitable way to do so would be to
create hooks into Lucene so that plugins can be built to delegate parts of the
processing to external code. And then, write a plugin (that uses jCuda, for
example) and do an integration test.
was (Author: ichattopadhyaya):
Hi Vikash,
Regarding licensing issue:
The work done in this project would be exploratory. That code won't necessarily
go into Lucene. When we are at a point where we see clear benefits from the
work done here, we would then have to explore all aspects of productionizing it
(including licensing).
Regarding next steps:
{quote}
BooleanScorer calls a lot of classes, e.g. the BM25 similarity or TF-IDF to do
the calculation that could possibly be parallelized.
{quote}
# First, understand how BooleanScorer calls these similarity classes and does
the scoring. There are unit tests in Lucene that can help you get there. This
might help: https://wiki.apache.org/lucene-java/HowToContribute
# Write a standalone CUDA/OpenCL project that does the same processing on the
GPU.
# Benchmark the speed of doing so on GPU vs. speed observed when doing the same
through the BooleanScorer. Preferably, on a large resultset. Include time for
copying results and scores in and out of the device memory from/to the main
memory.
# Optimize step 2, if possible.
Once this is achieved (which in itself could be a sufficient GSoC project), one
can have stretch goals to try out other parts of Lucene to optimize (e.g.
spatial search).
> Explore GPU acceleration
> ------------------------
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Ishan Chattopadhyaya
> Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known
> to be a good candidate for GPU based speedup (esp. when complex polygons are
> involved). In the past, Mike McCandless has mentioned that "both initial
> indexing and merging are CPU/IO intensive, but they are very amenable to
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I
> volunteer to mentor any GSoC student willing to work on this this summer.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]