[ 
https://issues.apache.org/jira/browse/SOLR-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528182#comment-13528182
 ] 

Yonik Seeley commented on SOLR-4144:
------------------------------------

I bet this could be due to NRTCachingDirectory?  It makes the decision to cache 
a file or not up-front and can't change when it's part-way through the file.

If there's no mergeInfo or flushInfo in the context (and the file isn't the 
segments file) then it will chose to cache the file.
We need to pass something (like flushInfo) that will convince it not to cache.  
I'll work up a patch...
                
> SolrCloud replication high heap consumption
> -------------------------------------------
>
>                 Key: SOLR-4144
>                 URL: https://issues.apache.org/jira/browse/SOLR-4144
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (java), SolrCloud
>    Affects Versions: 5.0
>         Environment: 5.0-SNAPSHOT 1366361:1416494M - markus - 2012-12-03 
> 14:09:13
>            Reporter: Markus Jelsma
>            Priority: Critical
>             Fix For: 5.0
>
>
> Recent versions of SolrCloud require a very high heap size vs. older 
> versions. Another cluster of 5.0.0.2012.10.09.19.29.59 (~4GB per core) can 
> restore an empty node without taking a lot of heap (xmx=256m). Recent 
> versions and current trunk fail miserably even with a higher heap (750m). 
> Both clusters have 10 nodes, 10 shards and 2 cores per node. One note to add 
> is that the cluster on which this fails has only about 1.5GB per core due to 
> changing in the Lucene codec such as compression.
> After start up everything goes fine...
> {code}
> 2012-12-04 15:05:35,013 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
> - : Begin buffering updates. core=shard_c
> 2012-12-04 15:05:35,013 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
> - : Begin buffering updates. core=shard_b
> 2012-12-04 15:05:35,013 INFO [solr.update.UpdateLog] - [RecoveryThread] - : 
> Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null}
> 2012-12-04 15:05:35,013 INFO [solr.update.UpdateLog] - [RecoveryThread] - : 
> Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null}
> 2012-12-04 15:05:35,013 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
> - : Attempting to replicate from http://178.21.118.190:8080/solr/shard_b/. 
> core=shard_b
> 2012-12-04 15:05:35,013 INFO [solr.cloud.RecoveryStrategy] - [RecoveryThread] 
> - : Attempting to replicate from http://178.21.118.192:8080/solr/shard_c/. 
> core=shard_c
> 2012-12-04 15:05:35,014 INFO [solrj.impl.HttpClientUtil] - [RecoveryThread] - 
> : Creating new http client, 
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> 2012-12-04 15:05:35,014 INFO [solrj.impl.HttpClientUtil] - [RecoveryThread] - 
> : Creating new http client, 
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> 2012-12-04 15:05:35,052 INFO [solr.handler.ReplicationHandler] - 
> [RecoveryThread] - : Commits will be reserved for  10000
> 2012-12-04 15:05:35,052 INFO [solr.handler.ReplicationHandler] - 
> [RecoveryThread] - : Commits will be reserved for  10000
> 2012-12-04 15:05:35,053 INFO [solrj.impl.HttpClientUtil] - [RecoveryThread] - 
> : Creating new http client, 
> config:connTimeout=5000&socketTimeout=20000&allowCompression=false&maxConnections=10000&maxConnectionsPerHost=10000
> 2012-12-04 15:05:35,060 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
>  No value set for 'pollInterval'. Timer Task not started.
> 2012-12-04 15:05:35,060 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
>  No value set for 'pollInterval'. Timer Task not started.
> 2012-12-04 15:05:35,069 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
> Master's generation: 48
> 2012-12-04 15:05:35,069 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
> Slave's generation: 1
> 2012-12-04 15:05:35,069 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
> Starting replication process
> 2012-12-04 15:05:35,069 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
> Master's generation: 47
> 2012-12-04 15:05:35,069 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
> Slave's generation: 1
> 2012-12-04 15:05:35,070 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
> Starting replication process
> 2012-12-04 15:05:35,078 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
> Number of files in latest index in master: 235
> 2012-12-04 15:05:35,079 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
> Number of files in latest index in master: 287
> 2012-12-04 15:05:35,084 WARN [solr.core.CachingDirectoryFactory] - 
> [RecoveryThread] - : No lockType configured for 
> NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_c/data/index.20121204150535080
>  lockFactory=org.apache.lucene.store.NativeFSLockFactory@57530551; 
> maxCacheMB=48.0 maxMergeSizeMB=4.0) assuming 'simple'
> 2012-12-04 15:05:35,085 INFO [solr.core.CachingDirectoryFactory] - 
> [RecoveryThread] - : return new directory for 
> /opt/solr/cores/shard_c/data/index.20121204150535080 forceNew:false
> 2012-12-04 15:05:35,085 INFO [solr.core.CachingDirectoryFactory] - 
> [RecoveryThread] - : Releasing directory:/opt/solr/cores/shard_c/data
> 2012-12-04 15:05:35,085 WARN [solr.core.CachingDirectoryFactory] - 
> [RecoveryThread] - : No lockType configured for 
> NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_b/data/index.20121204150535079
>  lockFactory=org.apache.lucene.store.NativeFSLockFactory@512fb063; 
> maxCacheMB=48.0 maxMergeSizeMB=4.0) assuming 'simple'
> 2012-12-04 15:05:35,085 INFO [solr.core.CachingDirectoryFactory] - 
> [RecoveryThread] - : return new directory for 
> /opt/solr/cores/shard_b/data/index.20121204150535079 forceNew:false
> 2012-12-04 15:05:35,085 INFO [solr.core.CachingDirectoryFactory] - 
> [RecoveryThread] - : Releasing directory:/opt/solr/cores/shard_b/data
> 2012-12-04 15:05:35,088 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
> Starting download to 
> NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_c/data/index.20121204150535080
>  lockFactory=org.apache.lucene.store.SimpleFSLockFactory@3bd48043; 
> maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true
> 2012-12-04 15:05:35,089 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
> Starting download to 
> NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_b/data/index.20121204150535079
>  lockFactory=org.apache.lucene.store.SimpleFSLockFactory@67fc9fee; 
> maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true
> {code}
> until suddenly
> {code}
> 2012-12-03 16:14:58,862 INFO [solr.core.CachingDirectoryFactory] - 
> [RecoveryThread] - : Releasing directory:/opt/solr/cores/shard_b/data/index
> 2012-12-03 16:15:06,357 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
> removing temporary index download directory files 
> NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_b/data/index.20121203161342097
>  lockFactory=org.apache.lucene.store.SimpleFSLockFactory@424c2849; 
> maxCacheMB=48.0 maxMergeSizeMB=4.0)
> 2012-12-03 16:14:58,610 INFO [solr.core.CachingDirectoryFactory] - 
> [RecoveryThread] - : Releasing directory:/opt/solr/cores/shard_c/data/index
> 2012-12-03 16:15:06,128 INFO [solr.core.SolrCore] - [http-8080-exec-2] - : 
> [shard_c] webapp=/solr path=/admin/system params={wt=json} status=0 
> QTime=11498 
> 2012-12-03 16:15:07,644 ERROR [solr.servlet.SolrDispatchFilter] - 
> [http-8080-exec-5] - : null:java.lang.OutOfMemoryError: Java heap space
> 2012-12-03 16:15:07,644 INFO [solr.handler.SnapPuller] - [RecoveryThread] - : 
> removing temporary index download directory files 
> NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_c/data/index.20121203161342096
>  lockFactory=org.apache.lucene.store.SimpleFSLockFactory@7a67f797; 
> maxCacheMB=48.0 maxMergeSizeMB=4.0)
> 2012-12-03 16:15:39,655 ERROR [solr.servlet.SolrDispatchFilter] - 
> [http-8080-exec-4] - : null:java.lang.RuntimeException: 
> java.lang.OutOfMemoryError: Java heap space
> {code}
> Just now it succeeded with Xmx=850m and NewRatio=1. Another test failed with 
> Xmx=750m and NewRatio=1. We can reproduce this behaviour rather easy by 
> purging the data directories and simply starting the node with less heap than 
> it today requires for replication.
> Please also see:
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201211.mbox/%3czarafa.5093d4ee.58d7.528aacd34e162...@mail.openindex.io%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to