On 3/2/2014 4:37 AM, jurio wrote:
> Is it better for performance to index the documents with data import handler
> (specifying the indexing with xml configuration request) or to index with
> solrJ (with update request).
> By now, im indexing documents in Java for testing (unit test) easily and to
> be less coupled to Solr implementation (i know it's difficult to change
> implementation, but if later i decide to use another search facet indexing i
> could maybe move easily).

Your situation may be very different than mine, which might make the
following advice completely incorrect:

Are you planning a multi-threaded SolrJ app?  If not, DIH (dataimport
handler) is probably going to be faster.  DIH uses a single thread for
all its operations, but the pipeline is *very* well optimized,
especially for databases.

A well-written multi-threaded SolrJ program would probably be faster
than DIH.  You would want to evaluate whether the bottleneck for
indexing is at Solr or at your data source, and make the slow side
multi-threaded.

If performance is still not acceptable, you might be able to make both
ends multi-threaded.

For an experienced Java programmer, multi-threaded programs are not
hard, but making sure everything is correct and fast is usually not a
trivial task.  Making both ends multi-threaded can be very tricky.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to