Re: SOLR 5.5 - Multithread in Dataimporthandler

Shawn Heisey Tue, 20 Dec 2016 08:06:36 -0800

On 12/20/2016 3:43 AM, Vellaimary C wrote:
> My organization is using SOLR for search handling . As we need to
> index more volume of documents like 300 millions, we have moved to
> SOLR 5.5.1.
> To speed up the import, which takes more than three weeks now atleast
> to 1 week we need parallel data import handler triggered.
> Can anyone help me to implement multithreading in dataimport handler.


If it were easy to achieve this, it would have already been done.  DIH
actually used to have a parameter for the number of threads, but it
didn't work, so it was removed.  Implementing multi-threaded support is
*NOT* a trivial undertaking.  If you figure out how to do it, we welcome
patches.

The best option is for you to write your own indexing application that
pulls data from the original source and uses multiple threads to index
the data in parallel.

To achieve this with DIH requires that you create multiple handlers with
different URL paths for names, and start imports on them all that run at
the same time, with "clean=false" so that the imports won't wipe the
index when they start.  Each one needs to handle part of the data in the
source system.

FYI, your question belongs on the solr-user mailing list.  This list is
for discussions around the development of Lucene/Solr itself.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: SOLR 5.5 - Multithread in Dataimporthandler

Reply via email to