[
https://issues.apache.org/jira/browse/SOLR-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642576#comment-13642576
]
Shalin Shekhar Mangar commented on SOLR-2356:
---------------------------------------------
bq. In my opinion, DIH should be completely redesigned as a standalone webapp.
It is a major design flaw that it is a RequestHandler within a Solr
Core/collection.
Actually, DIH started as a standalone webapp inside AOL. We changed it because
we didn't want to duplicate the schema in two places and also because we wanted
to have it available by default in Solr installations. Another web app means
you need to procure hardware, plan capacity/failover, create firewall holes etc
bq. As a standalone web app it could easily be deplyed on its own, talk to
multiple collections and be parallellized.
Talking to multiple collections was never a goal for DIH -- I'm not sure what
value it will bring. The multi-threading support in DIH can use a lot of
improvement for sure.
> indexing using DataImportHandler does not use entire CPU capacities
> -------------------------------------------------------------------
>
> Key: SOLR-2356
> URL: https://issues.apache.org/jira/browse/SOLR-2356
> Project: Solr
> Issue Type: Improvement
> Components: update
> Affects Versions: 4.0-ALPHA
> Environment: intel xeon processor (4 cores), Debian Linux Lenny,
> OpenJDK 64bits server v1.6.0
> Reporter: colby
> Priority: Minor
> Labels: test
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> When I use a DataImportHandler to index a large number of documents (~35M),
> cpu usage doesn't go over than 100% cpu (i.e. just one core).
> When I configure 4 threads for the <entity> tag, the cpu usage is splitted to
> 25% per core but never use 400% of cpu (i.e 100% of the 4 cores)
> I use solr embedded with jetty server.
> Is there a way to tune this feature in order to use all cores and improve
> indexing performances ?
> Because for the moment, an extra script (PHP) gives better indexing
> performances than DIH.
> thanks
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]