[
https://issues.apache.org/jira/browse/SOLR-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Otis Gospodnetic resolved SOLR-5075.
------------------------------------
Resolution: Invalid
> SolrCloud commit process is too time consuming, even if documents are light
> ---------------------------------------------------------------------------
>
> Key: SOLR-5075
> URL: https://issues.apache.org/jira/browse/SOLR-5075
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis, SolrCloud
> Affects Versions: 4.1
> Environment: SolrCloud 4.1, internal Zookeeper, 16 shards, custom
> java importer.
> Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb
> SSD and 50tb SAS memory
> Reporter: Radu Ghita
> Labels: import, solrconfig.xml
>
> We are having a client with business model that requires indexing each month
> billion rows into solr from mysql in a small time-frame. The documents are
> very light, but the number is very high and we need to achieve speeds of
> around 80-100k/s. The built in solr indexer goes to 40-50k tops, but after
> some hours ( ~12 ) it crashes and the speed slows down as hours go by.
> Therefore we have developed a custom java importer that connects directly to
> mysql and solrcloud via zookeeper, grabs data from mysql, creates documents
> and then imports into solr. This helps because we are opening ~50 threads and
> the indexing process speeds up. We have optimized the mysql queries ( mysql
> was the initial bottleneck ) and the speeds we get now are over 100k/s, but
> as index number gets bigger, solr stays very long on adding documents. I
> assume it needs to be something from solrconfig that makes solr stay and even
> block after 100 mil documents indexed.
> Here is the java code that creates documents and then adds to solr server:
> public void createDocuments() throws SQLException, SolrServerException,
> IOException
> {
> App.logger.write("Creating documents..");
> this.docs = new ArrayList<SolrInputDocument>();
> App.logger.incrementNumberOfRows(this.size);
> while(this.results.next())
> {
>
> this.docs.add(this.getDocumentFromResultSet(this.results));
> }
> this.statement.close();
> this.results.close();
> }
>
> public void commitDocuments() throws SolrServerException, IOException
> {
> App.logger.write("Committing..");
> App.solrServer.add(this.docs); // here it stays very long and
> then blocks
> App.logger.incrementNumberOfRows(this.docs.size());
> this.docs.clear();
> }
> I am also pasting solrconfig.xml parameters that make sense to this
> discussion:
> <maxIndexingThreads>128</maxIndexingThreads>
> <useCompoundFile>false</useCompoundFile>
> <ramBufferSizeMB>10000</ramBufferSizeMB>
> <maxBufferedDocs>1000000</maxBufferedDocs>
> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
> <int name="maxMergeAtOnce">20000</int>
> <int name="segmentsPerTier">1000000</int>
> <int name="maxMergeAtOnceExplicit">10000</int>
> </mergePolicy>
> <mergeFactor>100</mergeFactor>
> <termIndexInterval>1024</termIndexInterval>
> <autoCommit>
> <maxTime>15000</maxTime>
> <maxDocs>1000000</maxDocs>
> <openSearcher>false</openSearcher>
> </autoCommit>
> <autoSoftCommit>
> <maxTime>2000000</maxTime>
> </autoSoftCommit>
> Thanks a lot for any answers and excuse my long text, I'm new to this JIRA.
> If there's any other info needed please let me know.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]