Hi all, 

I am facing the same DB issue and trying to fix it. I tried to find the 
file path for FILE_LOCK=FS to set on gocd DB. Could you help me to locate 
the file, please?

Thank you,

Gulshen

On Monday, May 23, 2022 at 3:06:59 AM UTC-4 [email protected] wrote:

> When the issue happened (10:24) wrapper log only shows:
> INFO   | jvm 3    | 2022/05/18 10:23:44 | May 18, 2022 10:23:44 AM 
> jakarta.mail.Session loadResource
> INFO   | jvm 3    | 2022/05/18 10:23:44 | WARNING: expected resource not 
> found: /META-INF/javamail.default.address.map
> INFO   | jvm 3    | 2022/05/18 10:24:06 | May 18, 2022 10:24:05 AM 
> jakarta.mail.Session loadResource
> INFO   | jvm 3    | 2022/05/18 10:24:06 | WARNING: expected resource not 
> found: /META-INF/javamail.default.address.map
> INFO   | jvm 3    | 2022/05/18 10:24:18 | [70452.759s][warning][gc,alloc] 
> qtp1718515850-41: Retried waiting for GCLocker too often allocating 256 
> words
> INFO   | jvm 3    | 2022/05/18 10:24:18 | [70452.759s][warning][gc,alloc] 
> 105@MessageListener for WorkFinder: Retried waiting for GCLocker too often 
> allocating 386 words
> INFO   | jvm 3    | 2022/05/18 10:29:42 | May 18, 2022 10:29:42 AM 
> jakarta.mail.Session loadResource
> INFO   | jvm 3    | 2022/05/18 10:29:42 | WARNING: expected resource not 
> found: /META-INF/javamail.default.address.map
>
> Having a closer look at the log showed that the OOM situation already 
> happened one day earlier than expected. On 05/17 at round about 11:30 the 
> wrapper-log shows the out of memory error messages:
>
> 2022-05-17 11:30:39,871 ERROR [104@MessageListener for WorkFinder] 
> JMSMessageListenerAdapter:87 - Exception thrown in message handling by 
> listener com.thoughtworks.go.server.messaging.scheduling.WorkFinder@4cf95d4c
> org.springframework.transaction.TransactionSystemException: Could not roll 
> back JDBC transaction; nested exception is 
> org.h2.jdbc.JdbcSQLNonTransientConnectionException: The database has been 
> closed [90098-200]
>     at 
> org.springframework.jdbc.datasource.DataSourceTransactionManager.doRollback(DataSourceTransactionManager.java:329)
>     at 
> org.springframework.transaction.support.AbstractPlatformTransactionManager.processRollback(AbstractPlatformTransactionManager.java:857)
>     at 
> org.springframework.transaction.support.AbstractPlatformTransactionManager.rollback(AbstractPlatformTransactionManager.java:834)
>     at 
> org.springframework.transaction.support.TransactionTemplate.rollbackOnException(TransactionTemplate.java:164)
>     at 
> org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:137)
>     at 
> com.thoughtworks.go.server.transaction.TransactionTemplate.execute(TransactionTemplate.java:28)
>     at 
> com.thoughtworks.go.server.service.ScheduleService.rescheduleAbandonedBuildIfNecessary(ScheduleService.java:640)
>     at 
> com.thoughtworks.go.server.service.BuildAssignmentService.assignWorkToAgent(BuildAssignmentService.java:183)
>     at 
> com.thoughtworks.go.server.service.BuildAssignmentService.assignWorkToAgent(BuildAssignmentService.java:165)
>     at 
> com.thoughtworks.go.server.messaging.scheduling.WorkFinder.onMessage(WorkFinder.java:60)
>     at 
> com.thoughtworks.go.server.messaging.scheduling.WorkFinder.onMessage(WorkFinder.java:32)
>     at 
> com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:83)
>     at 
> com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:63)
>     at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.h2.jdbc.JdbcSQLNonTransientConnectionException: The 
> database has been closed [90098-200]
>     at org.h2.message.DbException.getJdbcSQLException(DbException.java:622)
>     at org.h2.message.DbException.getJdbcSQLException(DbException.java:429)
>     at org.h2.message.DbException.get(DbException.java:194)
>     at org.h2.engine.Session.getTransaction(Session.java:1792)
>     at org.h2.engine.Session.getStatementSavepoint(Session.java:1804)
>     at org.h2.engine.Session.setSavepoint(Session.java:915)
>     at org.h2.command.Command.executeUpdate(Command.java:244)
>     at 
> org.h2.jdbc.JdbcConnection.rollbackInternal(JdbcConnection.java:1530)
>     at org.h2.jdbc.JdbcConnection.rollback(JdbcConnection.java:561)
>     at 
> org.apache.commons.dbcp2.DelegatingConnection.rollback(DelegatingConnection.java:781)
>     at 
> org.apache.commons.dbcp2.DelegatingConnection.rollback(DelegatingConnection.java:781)
>     at 
> org.springframework.jdbc.datasource.DataSourceTransactionManager.doRollback(DataSourceTransactionManager.java:326)
>     ... 13 common frames omitted
> 2022-05-17 11:30:54,167 INFO  [WrapperJarAppMain] Jetty9Server:199 - 
> Configuring Jetty using /etc/go/jetty.xml
> 2022-05-17 11:30:54,236 WARN  [WrapperJarAppMain] Server:357 - 
> ErrorPageMapper not supported for Server level Error Handling
> 2022-05-17 11:30:54,372 WARN  [WrapperJarAppMain] AbstractHandler:96 - No 
> Server set for ResourceHandler@51be8b61{STOPPED}
> 2022-05-17 11:30:58,763 WARN  [WrapperJarAppMain] ConnectionManager:117 - 
> The file /etc/go/db.properties specified by `go.db.config` does not exist.
> 2022-05-17 11:30:59,154 INFO  [WrapperJarAppMain] DatabaseMigrator:40 - 
> Upgrading database, this might take a while depending on the size of the 
> database.
> 2022-05-17 11:30:59,154 INFO  [WrapperJarAppMain] DatabaseMigrator:49 - 
> ************************************************************************
> 2022-05-17 11:30:59,155 INFO  [WrapperJarAppMain] DatabaseMigrator:49 - 
> WARNING: Shutting down your server at this point will lead to a database 
> corruption. Please wait until the database upgrade completes.
> 2022-05-17 11:30:59,155 INFO  [WrapperJarAppMain] DatabaseMigrator:49 - 
> ************************************************************************
> 2022-05-17 11:31:01,356 INFO  [WrapperJarAppMain] DatabaseMigrator:57 - 
> Database upgrade completed successfully.
> 2022-05-17 11:31:01,357 INFO  [WrapperJarAppMain] DataMigrationRunner:34 - 
> Running data migrations...
> 2022-05-17 11:31:01,388 INFO  [WrapperJarAppMain] DataMigrationRunner:49 - 
> Data migration took 3 ms
> 2022-05-17 11:31:01,391 INFO  [WrapperJarAppMain] DataMigrationRunner:49 - 
> Data migration took 0 ms
> 2022-05-17 11:31:01,391 INFO  [WrapperJarAppMain] DataMigrationRunner:39 - 
> Data migrations completed.
> 2022-05-17 11:31:02,206 WARN  [WrapperJarAppMain] ConfigurationFactory:136 
> - No configuration found. Configuring ehcache from ehcache-failsafe.xml 
>  found in the classpath: 
> jar:file:/var/lib/go-server/work/jetty-0_0_0_0-8153-cruise_war-_go-any-/webapp/WEB-INF/lib/ehcache-2.10.9.2.jar!/ehcache-failsafe.xml
> 2022-05-17 11:31:02,383 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
> Could not find configuration [com.thoughtworks.go.domain.AccessToken]; 
> using defaults.
> 2022-05-17 11:31:02,436 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
> Could not find configuration 
> [com.thoughtworks.go.domain.NotificationFilter]; using defaults.
> 2022-05-17 11:31:02,538 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
> Could not find configuration [com.thoughtworks.go.domain.User]; using 
> defaults.
> 2022-05-17 11:31:02,560 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
> Could not find configuration [com.thoughtworks.go.domain.Plugin]; using 
> defaults.
> 2022-05-17 11:31:02,570 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
> Could not find configuration 
> [com.thoughtworks.go.domain.EnvironmentVariable]; using defaults.
> 2022-05-17 11:31:02,705 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
> Could not find configuration 
> [com.thoughtworks.go.domain.User.notificationFilters]; using defaults.
> 2022-05-17 11:31:02,789 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
> Could not find configuration [org.hibernate.cache.UpdateTimestampsCache]; 
> using defaults.
> 2022-05-17 11:31:02,826 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
> Could not find configuration [org.hibernate.cache.StandardQueryCache]; 
> using defaults.
> 2022-05-17 11:31:04,815 WARN  [WrapperJarAppMain] BrokerService:2163 - 
> Temporary Store limit is 51200 mb (current store usage is 0 mb). The data 
> directory: /var/lib/go-server only has 45488 mb of usable space. - 
> resetting to maximum available disk space: 45488 mb
> 2022-05-17 11:31:04,961 INFO  [WrapperJarAppMain] ConnectionManager:98 - 
> Done loading query extensions, found 
> com.thoughtworks.go.server.database.h2.H2QueryExtensions@6a25c389
> 2022-05-17 11:31:08,045 INFO  [WrapperJarAppMain] GoConfigMigration:93 - 
> Upgrading config file from version 139 to version 139
> 2022-05-17 11:31:08,097 INFO  [WrapperJarAppMain] GoConfigMigration:101 - 
> Finished upgrading config file
> 2022-05-17 11:31:08,097 INFO  [WrapperJarAppMain] GoConfigMigrator:106 - 
> [Config Save] Starting Config Save post upgrade using 
> FullConfigSaveNormalFlow
> 2022-05-17 11:31:08,183 INFO  [Thread-76] 
> DefaultPluginJarChangeListener:67 - Plugin load starting: 
> /var/lib/go-server/plugins/bundled/gocd-ldap-authentication-plugin.jar
>
> Afterwards it seems the server is running as usual, until the OOM and 
> "locked" issue is occurring 23 hours later. So I can imagine that you are 
> right and it is doing some "unclean restart". I suppose that GoCD is 
> innocent, since you can never plan how to act when out of memory occurs, 
> that's unpredictable. I assume that raising the size will solve the issue. 
>
> Thanks again for all the support, this is really a great community.
> Chad Wilson schrieb am Freitag, 20. Mai 2022 um 10:38:37 UTC+2:
>
>> No problem!
>>
>> Perhaps what is happening is that after the OOM has occurred, it is being 
>> detected by the "wrapper" process shipped with GoCD with and the Java 
>> process is being restarted. If so, you should be able to see that 
>> auto-restart happening in the logs, including the go-server-wrapper.log.
>>
>> Normally I'd expect that to ensure the file lock is cleanly released 
>> before the new process tries to acquire it, but perhaps it is not shutting 
>> down cleanly, or there is something unusual about the file system.
>>
>> If it keeps happening, in addition to lsof, perhaps lslocks (or looking 
>> at /proc/locks) is of use to see which PIDs have locked which files more 
>> explicitly.
>>
>> In any case, fixing the OOMs is probably a good idea regardless of this, 
>> so it seems sensible to try with a larger heap to address that issue first.
>>
>> -Chad
>>
>>
>> On Fri, May 20, 2022 at 3:39 PM AquilaNiger <[email protected]> wrote:
>>
>>> Hi Chad,
>>>
>>> thanks for your support. 
>>>
>>>>
>>>>    - Does the server recover after some time, or you need to restart 
>>>>    GoCD or take some other action to fix it?
>>>>
>>>> No it does not recover, I have to restart GoCD. 
>>>
>>>>
>>>>    - How are you running GoCD? i.e in which environment? Container? 
>>>>    Standard server?
>>>>
>>>> Standard Server on a Ubuntu 18.04.6 LTS virtual machine
>>>
>>>>
>>>>    - Is your DB file on some kind of network mount or something like 
>>>>    that?
>>>>
>>>> No, it isn't. 
>>>
>>>>
>>>>    - Is there a way to verify there aren't multiple processes/GoCD 
>>>>    Instances trying to access the file?
>>>>       - when it happens, are you able to use OS-level commands such as 
>>>>       lsof to see if other/multiple processes have handles on the DB file 
>>>>       (depends on whether storage is local)
>>>>    
>>>> Currently this happens only once in a while (last time there were 6 
>>> days between the database issues). lsof is a good idea! I'll try that the 
>>> next time it happens.  
>>>
>>>>
>>>>    - Would be good to confirm you don't see GoCD crashing or getting 
>>>>    auto-restarted in your logs to rule out GoCD itself having a different 
>>>>    problem, and then this problem is being caused by a zombie GoCD process 
>>>> or 
>>>>    some kind of stale lock which takes time to expire.
>>>>    
>>>> Actually, we found out with yesterdays go-server.log that the root 
>>> cause seems to be Out of Memory of Java:
>>> 2022-05-18 10:24:18,741 WARN [105@MessageListener for WorkFinder] 
>>> BasicDataSource:58 - An internal object pool swallowed an Exception. 
>>> org.h2.jdbc.JdbcSQLNonTransientConnectionException: The database has been 
>>> closed; SQL statement: ROLLBACK [90098-200]
>>> ...
>>> 2022-05-18 10:24:18,742 WARN [105@MessageListener for WorkFinder] 
>>> BasicDataSource:58 - An internal object pool swallowed an Exception. 
>>> org.h2.jdbc.JdbcSQLNonTransientConnectionException: Out of memory. 
>>> [90108-200]
>>> ...
>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>
>>> Actually we haven't touched heap size of GoCD up to now. Therefore, we 
>>> increased it now in wrapper-properties.conf and hope that the error will be 
>>> gone. I hope that does not only defer the error to some days later.
>>>
>>>>
>>>>    - Do you have any overrides to DB configuration, e.g a custom 
>>>>    *config/db.properties* file?
>>>>
>>>> No.
>>>  
>>>
>>>> To answer your question on the trace files, I think you get two files 
>>>> when the main trace file reaches an H2-configured maximum size. I ask the 
>>>> above question on DB properties as I think GoCD sets that to 16MB by 
>>>> default, whereas yours seems to have got to 64MB which seems curious.
>>>>
>>>  
>>> Thanks, that explains a lot. You're right, the "old" file contains 
>>> timestamps from 3:16 to 6:38 and the new one from 6:38 to 8:16.
>>>
>>>
>>>> There is a way to change the locking approach H2 uses 
>>>> <https://www.h2database.com/html/advanced.html#file_locking_protocols> 
>>>> (back to the older ;FILE_LOCK=FS - which creates the stale 
>>>> cruise.lock.db you have in your screenshot) if the issue is with the 
>>>> filesystem, however I imagine you'd want to rule out multiple processes or 
>>>> some other issue first.
>>>>
>>>  
>>> Thanks for the hint, I'll keep that in mind as "last resort". 
>>>
>>> Thank you for your support. I think we'll wait now if the error occurs 
>>> again. 
>>>
>>> Julia
>>>
>>> -- 
>>>
>> You received this message because you are subscribed to the Google Groups 
>>> "go-cd" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>>
>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/go-cd/8c61c76e-bb7b-474c-bfa5-1f623794fce4n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/go-cd/8c61c76e-bb7b-474c-bfa5-1f623794fce4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/go-cd/001fafdf-ac59-440d-a802-73cde8ee6ea4n%40googlegroups.com.

Reply via email to