[ 
https://issues.apache.org/jira/browse/GEODE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125904#comment-17125904
 ] 

ASF GitHub Bot commented on GEODE-8029:
---------------------------------------

jujoramos commented on a change in pull request #5099:
URL: https://github.com/apache/geode/pull/5099#discussion_r435237982



##########
File path: geode-core/src/main/java/org/apache/geode/internal/cache/Oplog.java
##########
@@ -937,14 +937,21 @@ void initAfterRecovery(boolean offline) {
         // this.crf.raf.seek(this.crf.currSize);
       } else if (!offline) {
         // drf exists but crf has been deleted (because it was empty).
-        // I don't think the drf needs to be opened. It is only used during
-        // recovery.
-        // At some point the compacter my identify that it can be deleted.
         this.crf.RAFClosed = true;
         deleteCRF();
+
+        // See GEODE-8029.

Review comment:
       > This change comes into picture after the recovery is completed - 
"initAfterRecovery".
   The real problem seems to be during recovery. If there is a periodic 
recovery this could be fine, but if there are large number of deleted records 
during the first recovery, the issue may still arise...We may need to have some 
checks during recovery (before reading the drfs) to avoid reading large deleted 
records. Please correct me if my understanding is wrong.
   
   Your understanding is correct, however, the change are to prevent the issue 
from happening in the first place. By deleting the unused `drf` files here and 
avoid storing them on disk when not needed, users won't hit the 
`IllegalStateException` at all.
   As a side note, in order to hit the problem in the first place, the user 
would need to have more than `805306401` delete operations within the `opLog` 
files in one single run, and compaction should have not run at all, which is 
highly unlikely.
   If you think we should add the count while reading the files instead and, 
somehow, expand the load factor when needed or something similar, let me know 
and I'll give it a try. I'm worried, though, about the performance impact this 
might have while recovering files from disk... the recovery time will suffer 
during every single startup and, considering that there's only a handful of 
scenarios on which the actual issue can happen, I believe making sure we delete 
unused files is a better approach.
   
   > Also, looking at the other part of the code where "setHasDeletes" is 
called, it looks like it is called before the deleteDRF() - there could be a 
reason for doing this.
   
   Will change this, thanks for catching it.
   
   > And there is also call to "getOplogSet().removeDrf(this);" This may be 
needed here...
   
   This is already done within the `deleteDRF()` method.
   
   ---
   
   Please let me know what you think about point 1 (add an extra check while 
recovering to make sure we are below the load-factor threshold), so I can go 
ahead and make all the changes at once.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> java.lang.IllegalArgumentException: Too large (805306401 expected elements 
> with load factor 0.75)
> -------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-8029
>                 URL: https://issues.apache.org/jira/browse/GEODE-8029
>             Project: Geode
>          Issue Type: Bug
>          Components: configuration, core, gfsh
>    Affects Versions: 1.9.0
>            Reporter: Jagadeesh sivasankaran
>            Assignee: Juan Ramos
>            Priority: Major
>              Labels: GeodeCommons, caching-applications
>         Attachments: Screen Shot 2020-04-27 at 12.21.19 PM.png, Screen Shot 
> 2020-04-27 at 12.21.19 PM.png, server02.log
>
>
> we have a cluster of three Locator Geode and three Cache Server running in 
> CentOS servers. Today (April 27) after patching our CENTOS servers , all 
> locator and 2 servers came up , But one Cache server was not starting . here 
> is the Exception details.  Please let me know how to resolve the beloe issue 
> and need any configuration changes to diskstore ? 
>  
>  
> Starting a Geode Server in /app/provServerHO2...
> ....................................................................................................................................................................................................................The
>  Cache Server process terminated unexpectedly with exit status 1. Please 
> refer to the log file in /app/provServerHO2 for full details.
> Exception in thread "main" java.lang.IllegalArgumentException: Too large 
> (805306401 expected elements with load factor 0.75)
> at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:222)
> at it.unimi.dsi.fastutil.ints.IntOpenHashSet.add(IntOpenHashSet.java:308)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl$OplogEntryIdSet.add(DiskStoreImpl.java:3474)
> at org.apache.geode.internal.cache.Oplog.readDelEntry(Oplog.java:3007)
> at org.apache.geode.internal.cache.Oplog.recoverDrf(Oplog.java:1500)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverOplogs(PersistentOplogSet.java:445)
> at 
> org.apache.geode.internal.cache.PersistentOplogSet.recoverRegionsThatAreReady(PersistentOplogSet.java:369)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.recoverRegionsThatAreReady(DiskStoreImpl.java:2053)
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.initializeIfNeeded(DiskStoreImpl.java:2041)
> security-peer-auth-init=
> at 
> org.apache.geode.internal.cache.DiskStoreImpl.doInitialRecovery(DiskStoreImpl.java:2046)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.initializeDiskStore(DiskStoreFactoryImpl.java:184)
> at 
> org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:150)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.createDiskStore(CacheCreation.java:794)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.initializePdxDiskStore(CacheCreation.java:785)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:509)
> at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> at 
> org.apache.geode.internal.cache.ClusterConfigurationLoader.applyClusterXmlConfiguration(ClusterConfigurationLoader.java:197)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.applyJarAndXmlFromClusterConfig(GemFireCacheImpl.java:1240)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1206)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
> at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to