[ https://issues.apache.org/jira/browse/SOLR-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528396#comment-13528396 ]
Markus Jelsma commented on SOLR-4144: ------------------------------------- Yonik, we have a cluster standing by on which i can reproduce the problem. I'll reconfirm this problem and then deploy the wars with your patch tomorrow and report back. > SolrCloud replication high heap consumption > ------------------------------------------- > > Key: SOLR-4144 > URL: https://issues.apache.org/jira/browse/SOLR-4144 > Project: Solr > Issue Type: Bug > Components: replication (java), SolrCloud > Affects Versions: 5.0 > Environment: 5.0-SNAPSHOT 1366361:1416494M - markus - 2012-12-03 > 14:09:13 > Reporter: Markus Jelsma > Priority: Critical > Fix For: 5.0 > > Attachments: SOLR-4144.patch > > > Recent versions of SolrCloud require a very high heap size vs. older > versions. Another cluster of 5.0.0.2012.10.09.19.29.59 (~4GB per core) can > restore an empty node without taking a lot of heap (xmx=256m). Recent > versions and current trunk fail miserably even with a higher heap (750m). > Both clusters have 10 nodes, 10 shards and 2 cores per node. One note to add > is that the cluster on which this fails has only about 1.5GB per core due to > changing in the Lucene codec such as compression. > After start up everything goes fine... > {code} > 2012-12-04 15:05:35,013 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] > - : Begin buffering updates. core=shard_c > 2012-12-04 15:05:35,013 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] > - : Begin buffering updates. core=shard_b > 2012-12-04 15:05:35,013 INFO [solr.update.UpdateLog] - [RecoveryThread] - : > Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null} > 2012-12-04 15:05:35,013 INFO [solr.update.UpdateLog] - [RecoveryThread] - : > Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null} > 2012-12-04 15:05:35,013 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] > - : Attempting to replicate from http://178.21.118.190:8080/solr/shard_b/. > core=shard_b > 2012-12-04 15:05:35,013 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] > - : Attempting to replicate from http://178.21.118.192:8080/solr/shard_c/. > core=shard_c > 2012-12-04 15:05:35,014 INFO [solrj.impl.HttpClientUtil] - [RecoveryThread] - > : Creating new http client, > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > 2012-12-04 15:05:35,014 INFO [solrj.impl.HttpClientUtil] - [RecoveryThread] - > : Creating new http client, > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > 2012-12-04 15:05:35,052 INFO [solr.handler.ReplicationHandler] - > [RecoveryThread] - : Commits will be reserved for 10000 > 2012-12-04 15:05:35,052 INFO [solr.handler.ReplicationHandler] - > [RecoveryThread] - : Commits will be reserved for 10000 > 2012-12-04 15:05:35,053 INFO [solrj.impl.HttpClientUtil] - [RecoveryThread] - > : Creating new http client, > config:connTimeout=5000&socketTimeout=20000&allowCompression=false&maxConnections=10000&maxConnectionsPerHost=10000 > 2012-12-04 15:05:35,060 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > No value set for 'pollInterval'. Timer Task not started. > 2012-12-04 15:05:35,060 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > No value set for 'pollInterval'. Timer Task not started. > 2012-12-04 15:05:35,069 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > Master's generation: 48 > 2012-12-04 15:05:35,069 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > Slave's generation: 1 > 2012-12-04 15:05:35,069 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > Starting replication process > 2012-12-04 15:05:35,069 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > Master's generation: 47 > 2012-12-04 15:05:35,069 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > Slave's generation: 1 > 2012-12-04 15:05:35,070 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > Starting replication process > 2012-12-04 15:05:35,078 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > Number of files in latest index in master: 235 > 2012-12-04 15:05:35,079 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > Number of files in latest index in master: 287 > 2012-12-04 15:05:35,084 WARN [solr.core.CachingDirectoryFactory] - > [RecoveryThread] - : No lockType configured for > NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_c/data/index.20121204150535080 > lockFactory=org.apache.lucene.store.NativeFSLockFactory@57530551; > maxCacheMB=48.0 maxMergeSizeMB=4.0) assuming 'simple' > 2012-12-04 15:05:35,085 INFO [solr.core.CachingDirectoryFactory] - > [RecoveryThread] - : return new directory for > /opt/solr/cores/shard_c/data/index.20121204150535080 forceNew:false > 2012-12-04 15:05:35,085 INFO [solr.core.CachingDirectoryFactory] - > [RecoveryThread] - : Releasing directory:/opt/solr/cores/shard_c/data > 2012-12-04 15:05:35,085 WARN [solr.core.CachingDirectoryFactory] - > [RecoveryThread] - : No lockType configured for > NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_b/data/index.20121204150535079 > lockFactory=org.apache.lucene.store.NativeFSLockFactory@512fb063; > maxCacheMB=48.0 maxMergeSizeMB=4.0) assuming 'simple' > 2012-12-04 15:05:35,085 INFO [solr.core.CachingDirectoryFactory] - > [RecoveryThread] - : return new directory for > /opt/solr/cores/shard_b/data/index.20121204150535079 forceNew:false > 2012-12-04 15:05:35,085 INFO [solr.core.CachingDirectoryFactory] - > [RecoveryThread] - : Releasing directory:/opt/solr/cores/shard_b/data > 2012-12-04 15:05:35,088 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > Starting download to > NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_c/data/index.20121204150535080 > lockFactory=org.apache.lucene.store.SimpleFSLockFactory@3bd48043; > maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true > 2012-12-04 15:05:35,089 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > Starting download to > NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_b/data/index.20121204150535079 > lockFactory=org.apache.lucene.store.SimpleFSLockFactory@67fc9fee; > maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true > {code} > until suddenly > {code} > 2012-12-03 16:14:58,862 INFO [solr.core.CachingDirectoryFactory] - > [RecoveryThread] - : Releasing directory:/opt/solr/cores/shard_b/data/index > 2012-12-03 16:15:06,357 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > removing temporary index download directory files > NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_b/data/index.20121203161342097 > lockFactory=org.apache.lucene.store.SimpleFSLockFactory@424c2849; > maxCacheMB=48.0 maxMergeSizeMB=4.0) > 2012-12-03 16:14:58,610 INFO [solr.core.CachingDirectoryFactory] - > [RecoveryThread] - : Releasing directory:/opt/solr/cores/shard_c/data/index > 2012-12-03 16:15:06,128 INFO [solr.core.SolrCore] - [http-8080-exec-2] - : > [shard_c] webapp=/solr path=/admin/system params={wt=json} status=0 > QTime=11498 > 2012-12-03 16:15:07,644 ERROR [solr.servlet.SolrDispatchFilter] - > [http-8080-exec-5] - : null:java.lang.OutOfMemoryError: Java heap space > 2012-12-03 16:15:07,644 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : > removing temporary index download directory files > NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_c/data/index.20121203161342096 > lockFactory=org.apache.lucene.store.SimpleFSLockFactory@7a67f797; > maxCacheMB=48.0 maxMergeSizeMB=4.0) > 2012-12-03 16:15:39,655 ERROR [solr.servlet.SolrDispatchFilter] - > [http-8080-exec-4] - : null:java.lang.RuntimeException: > java.lang.OutOfMemoryError: Java heap space > {code} > Just now it succeeded with Xmx=850m and NewRatio=1. Another test failed with > Xmx=750m and NewRatio=1. We can reproduce this behaviour rather easy by > purging the data directories and simply starting the node with less heap than > it today requires for replication. > Please also see: > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201211.mbox/%3czarafa.5093d4ee.58d7.528aacd34e162...@mail.openindex.io%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org