[ 
https://issues.apache.org/jira/browse/SOLR-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641944#comment-13641944
 ] 

Shawn Heisey commented on SOLR-2356:
------------------------------------

Roman, patches are welcome.  If you know how to fix it, get the source code and 
go for it, then upload the patch.  The issue is more than two years old, so if 
it were an easy fix, the people that really know DIH would have fixed it 
already.  You can use the SolrJ library to write a multi-threaded application 
to import data.  If the design is solid, it could ultimately become the basis 
for a new DIH.

It used to be possible to configure multiple threads in the DIH config, but 
that was removed in 4.x because it was unstable.  Also, it didn't really help, 
as the issue reporter found.  It will probably take a complete redesign to fix 
this issue, and DIH is a contrib module, not part of the main Solr code.  That 
is why this is marked minor.

                
> indexing using DataImportHandler does not use entire CPU capacities
> -------------------------------------------------------------------
>
>                 Key: SOLR-2356
>                 URL: https://issues.apache.org/jira/browse/SOLR-2356
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 4.0-ALPHA
>         Environment: intel xeon processor (4 cores), Debian Linux Lenny, 
> OpenJDK 64bits server v1.6.0
>            Reporter: colby
>            Priority: Minor
>              Labels: test
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When I use a DataImportHandler to index a large number of documents (~35M), 
> cpu usage doesn't go over than 100% cpu (i.e. just one core).
> When I configure 4 threads for the <entity> tag, the cpu usage is splitted to 
> 25% per core but never use 400% of cpu (i.e 100% of the 4 cores)
> I use solr embedded with jetty server.
> Is there a way to tune this feature in order to use all cores and improve 
> indexing performances ?
> Because for the moment, an extra script (PHP) gives better indexing 
> performances than DIH.
> thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to