When the issue happened (10:24) wrapper log only shows:
INFO   | jvm 3    | 2022/05/18 10:23:44 | May 18, 2022 10:23:44 AM 
jakarta.mail.Session loadResource
INFO   | jvm 3    | 2022/05/18 10:23:44 | WARNING: expected resource not 
found: /META-INF/javamail.default.address.map
INFO   | jvm 3    | 2022/05/18 10:24:06 | May 18, 2022 10:24:05 AM 
jakarta.mail.Session loadResource
INFO   | jvm 3    | 2022/05/18 10:24:06 | WARNING: expected resource not 
found: /META-INF/javamail.default.address.map
INFO   | jvm 3    | 2022/05/18 10:24:18 | [70452.759s][warning][gc,alloc] 
qtp1718515850-41: Retried waiting for GCLocker too often allocating 256 
words
INFO   | jvm 3    | 2022/05/18 10:24:18 | [70452.759s][warning][gc,alloc] 
105@MessageListener for WorkFinder: Retried waiting for GCLocker too often 
allocating 386 words
INFO   | jvm 3    | 2022/05/18 10:29:42 | May 18, 2022 10:29:42 AM 
jakarta.mail.Session loadResource
INFO   | jvm 3    | 2022/05/18 10:29:42 | WARNING: expected resource not 
found: /META-INF/javamail.default.address.map

Having a closer look at the log showed that the OOM situation already 
happened one day earlier than expected. On 05/17 at round about 11:30 the 
wrapper-log shows the out of memory error messages:

2022-05-17 11:30:39,871 ERROR [104@MessageListener for WorkFinder] 
JMSMessageListenerAdapter:87 - Exception thrown in message handling by 
listener com.thoughtworks.go.server.messaging.scheduling.WorkFinder@4cf95d4c
org.springframework.transaction.TransactionSystemException: Could not roll 
back JDBC transaction; nested exception is 
org.h2.jdbc.JdbcSQLNonTransientConnectionException: The database has been 
closed [90098-200]
    at 
org.springframework.jdbc.datasource.DataSourceTransactionManager.doRollback(DataSourceTransactionManager.java:329)
    at 
org.springframework.transaction.support.AbstractPlatformTransactionManager.processRollback(AbstractPlatformTransactionManager.java:857)
    at 
org.springframework.transaction.support.AbstractPlatformTransactionManager.rollback(AbstractPlatformTransactionManager.java:834)
    at 
org.springframework.transaction.support.TransactionTemplate.rollbackOnException(TransactionTemplate.java:164)
    at 
org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:137)
    at 
com.thoughtworks.go.server.transaction.TransactionTemplate.execute(TransactionTemplate.java:28)
    at 
com.thoughtworks.go.server.service.ScheduleService.rescheduleAbandonedBuildIfNecessary(ScheduleService.java:640)
    at 
com.thoughtworks.go.server.service.BuildAssignmentService.assignWorkToAgent(BuildAssignmentService.java:183)
    at 
com.thoughtworks.go.server.service.BuildAssignmentService.assignWorkToAgent(BuildAssignmentService.java:165)
    at 
com.thoughtworks.go.server.messaging.scheduling.WorkFinder.onMessage(WorkFinder.java:60)
    at 
com.thoughtworks.go.server.messaging.scheduling.WorkFinder.onMessage(WorkFinder.java:32)
    at 
com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:83)
    at 
com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:63)
    at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.h2.jdbc.JdbcSQLNonTransientConnectionException: The database 
has been closed [90098-200]
    at org.h2.message.DbException.getJdbcSQLException(DbException.java:622)
    at org.h2.message.DbException.getJdbcSQLException(DbException.java:429)
    at org.h2.message.DbException.get(DbException.java:194)
    at org.h2.engine.Session.getTransaction(Session.java:1792)
    at org.h2.engine.Session.getStatementSavepoint(Session.java:1804)
    at org.h2.engine.Session.setSavepoint(Session.java:915)
    at org.h2.command.Command.executeUpdate(Command.java:244)
    at org.h2.jdbc.JdbcConnection.rollbackInternal(JdbcConnection.java:1530)
    at org.h2.jdbc.JdbcConnection.rollback(JdbcConnection.java:561)
    at 
org.apache.commons.dbcp2.DelegatingConnection.rollback(DelegatingConnection.java:781)
    at 
org.apache.commons.dbcp2.DelegatingConnection.rollback(DelegatingConnection.java:781)
    at 
org.springframework.jdbc.datasource.DataSourceTransactionManager.doRollback(DataSourceTransactionManager.java:326)
    ... 13 common frames omitted
2022-05-17 11:30:54,167 INFO  [WrapperJarAppMain] Jetty9Server:199 - 
Configuring Jetty using /etc/go/jetty.xml
2022-05-17 11:30:54,236 WARN  [WrapperJarAppMain] Server:357 - 
ErrorPageMapper not supported for Server level Error Handling
2022-05-17 11:30:54,372 WARN  [WrapperJarAppMain] AbstractHandler:96 - No 
Server set for ResourceHandler@51be8b61{STOPPED}
2022-05-17 11:30:58,763 WARN  [WrapperJarAppMain] ConnectionManager:117 - 
The file /etc/go/db.properties specified by `go.db.config` does not exist.
2022-05-17 11:30:59,154 INFO  [WrapperJarAppMain] DatabaseMigrator:40 - 
Upgrading database, this might take a while depending on the size of the 
database.
2022-05-17 11:30:59,154 INFO  [WrapperJarAppMain] DatabaseMigrator:49 - 
************************************************************************
2022-05-17 11:30:59,155 INFO  [WrapperJarAppMain] DatabaseMigrator:49 - 
WARNING: Shutting down your server at this point will lead to a database 
corruption. Please wait until the database upgrade completes.
2022-05-17 11:30:59,155 INFO  [WrapperJarAppMain] DatabaseMigrator:49 - 
************************************************************************
2022-05-17 11:31:01,356 INFO  [WrapperJarAppMain] DatabaseMigrator:57 - 
Database upgrade completed successfully.
2022-05-17 11:31:01,357 INFO  [WrapperJarAppMain] DataMigrationRunner:34 - 
Running data migrations...
2022-05-17 11:31:01,388 INFO  [WrapperJarAppMain] DataMigrationRunner:49 - 
Data migration took 3 ms
2022-05-17 11:31:01,391 INFO  [WrapperJarAppMain] DataMigrationRunner:49 - 
Data migration took 0 ms
2022-05-17 11:31:01,391 INFO  [WrapperJarAppMain] DataMigrationRunner:39 - 
Data migrations completed.
2022-05-17 11:31:02,206 WARN  [WrapperJarAppMain] ConfigurationFactory:136 
- No configuration found. Configuring ehcache from ehcache-failsafe.xml 
 found in the classpath: 
jar:file:/var/lib/go-server/work/jetty-0_0_0_0-8153-cruise_war-_go-any-/webapp/WEB-INF/lib/ehcache-2.10.9.2.jar!/ehcache-failsafe.xml
2022-05-17 11:31:02,383 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
Could not find configuration [com.thoughtworks.go.domain.AccessToken]; 
using defaults.
2022-05-17 11:31:02,436 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
Could not find configuration 
[com.thoughtworks.go.domain.NotificationFilter]; using defaults.
2022-05-17 11:31:02,538 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
Could not find configuration [com.thoughtworks.go.domain.User]; using 
defaults.
2022-05-17 11:31:02,560 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
Could not find configuration [com.thoughtworks.go.domain.Plugin]; using 
defaults.
2022-05-17 11:31:02,570 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
Could not find configuration 
[com.thoughtworks.go.domain.EnvironmentVariable]; using defaults.
2022-05-17 11:31:02,705 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
Could not find configuration 
[com.thoughtworks.go.domain.User.notificationFilters]; using defaults.
2022-05-17 11:31:02,789 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
Could not find configuration [org.hibernate.cache.UpdateTimestampsCache]; 
using defaults.
2022-05-17 11:31:02,826 WARN  [WrapperJarAppMain] EhCacheProvider:93 - 
Could not find configuration [org.hibernate.cache.StandardQueryCache]; 
using defaults.
2022-05-17 11:31:04,815 WARN  [WrapperJarAppMain] BrokerService:2163 - 
Temporary Store limit is 51200 mb (current store usage is 0 mb). The data 
directory: /var/lib/go-server only has 45488 mb of usable space. - 
resetting to maximum available disk space: 45488 mb
2022-05-17 11:31:04,961 INFO  [WrapperJarAppMain] ConnectionManager:98 - 
Done loading query extensions, found 
com.thoughtworks.go.server.database.h2.H2QueryExtensions@6a25c389
2022-05-17 11:31:08,045 INFO  [WrapperJarAppMain] GoConfigMigration:93 - 
Upgrading config file from version 139 to version 139
2022-05-17 11:31:08,097 INFO  [WrapperJarAppMain] GoConfigMigration:101 - 
Finished upgrading config file
2022-05-17 11:31:08,097 INFO  [WrapperJarAppMain] GoConfigMigrator:106 - 
[Config Save] Starting Config Save post upgrade using 
FullConfigSaveNormalFlow
2022-05-17 11:31:08,183 INFO  [Thread-76] DefaultPluginJarChangeListener:67 
- Plugin load starting: 
/var/lib/go-server/plugins/bundled/gocd-ldap-authentication-plugin.jar

Afterwards it seems the server is running as usual, until the OOM and 
"locked" issue is occurring 23 hours later. So I can imagine that you are 
right and it is doing some "unclean restart". I suppose that GoCD is 
innocent, since you can never plan how to act when out of memory occurs, 
that's unpredictable. I assume that raising the size will solve the issue. 

Thanks again for all the support, this is really a great community.
Chad Wilson schrieb am Freitag, 20. Mai 2022 um 10:38:37 UTC+2:

> No problem!
>
> Perhaps what is happening is that after the OOM has occurred, it is being 
> detected by the "wrapper" process shipped with GoCD with and the Java 
> process is being restarted. If so, you should be able to see that 
> auto-restart happening in the logs, including the go-server-wrapper.log.
>
> Normally I'd expect that to ensure the file lock is cleanly released 
> before the new process tries to acquire it, but perhaps it is not shutting 
> down cleanly, or there is something unusual about the file system.
>
> If it keeps happening, in addition to lsof, perhaps lslocks (or looking 
> at /proc/locks) is of use to see which PIDs have locked which files more 
> explicitly.
>
> In any case, fixing the OOMs is probably a good idea regardless of this, 
> so it seems sensible to try with a larger heap to address that issue first.
>
> -Chad
>
>
> On Fri, May 20, 2022 at 3:39 PM AquilaNiger <[email protected]> wrote:
>
>> Hi Chad,
>>
>> thanks for your support. 
>>
>>>
>>>    - Does the server recover after some time, or you need to restart 
>>>    GoCD or take some other action to fix it?
>>>
>>> No it does not recover, I have to restart GoCD. 
>>
>>>
>>>    - How are you running GoCD? i.e in which environment? Container? 
>>>    Standard server?
>>>
>>> Standard Server on a Ubuntu 18.04.6 LTS virtual machine
>>
>>>
>>>    - Is your DB file on some kind of network mount or something like 
>>>    that?
>>>
>>> No, it isn't. 
>>
>>>
>>>    - Is there a way to verify there aren't multiple processes/GoCD 
>>>    Instances trying to access the file?
>>>       - when it happens, are you able to use OS-level commands such as 
>>>       lsof to see if other/multiple processes have handles on the DB file 
>>>       (depends on whether storage is local)
>>>    
>>> Currently this happens only once in a while (last time there were 6 days 
>> between the database issues). lsof is a good idea! I'll try that the next 
>> time it happens.  
>>
>>>
>>>    - Would be good to confirm you don't see GoCD crashing or getting 
>>>    auto-restarted in your logs to rule out GoCD itself having a different 
>>>    problem, and then this problem is being caused by a zombie GoCD process 
>>> or 
>>>    some kind of stale lock which takes time to expire.
>>>    
>>> Actually, we found out with yesterdays go-server.log that the root cause 
>> seems to be Out of Memory of Java:
>> 2022-05-18 10:24:18,741 WARN [105@MessageListener for WorkFinder] 
>> BasicDataSource:58 - An internal object pool swallowed an Exception. 
>> org.h2.jdbc.JdbcSQLNonTransientConnectionException: The database has been 
>> closed; SQL statement: ROLLBACK [90098-200]
>> ...
>> 2022-05-18 10:24:18,742 WARN [105@MessageListener for WorkFinder] 
>> BasicDataSource:58 - An internal object pool swallowed an Exception. 
>> org.h2.jdbc.JdbcSQLNonTransientConnectionException: Out of memory. 
>> [90108-200]
>> ...
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>
>> Actually we haven't touched heap size of GoCD up to now. Therefore, we 
>> increased it now in wrapper-properties.conf and hope that the error will be 
>> gone. I hope that does not only defer the error to some days later.
>>
>>>
>>>    - Do you have any overrides to DB configuration, e.g a custom 
>>>    *config/db.properties* file?
>>>
>>> No.
>>  
>>
>>> To answer your question on the trace files, I think you get two files 
>>> when the main trace file reaches an H2-configured maximum size. I ask the 
>>> above question on DB properties as I think GoCD sets that to 16MB by 
>>> default, whereas yours seems to have got to 64MB which seems curious.
>>>
>>  
>> Thanks, that explains a lot. You're right, the "old" file contains 
>> timestamps from 3:16 to 6:38 and the new one from 6:38 to 8:16.
>>
>>
>>> There is a way to change the locking approach H2 uses 
>>> <https://www.h2database.com/html/advanced.html#file_locking_protocols> 
>>> (back to the older ;FILE_LOCK=FS - which creates the stale 
>>> cruise.lock.db you have in your screenshot) if the issue is with the 
>>> filesystem, however I imagine you'd want to rule out multiple processes or 
>>> some other issue first.
>>>
>>  
>> Thanks for the hint, I'll keep that in mind as "last resort". 
>>
>> Thank you for your support. I think we'll wait now if the error occurs 
>> again. 
>>
>> Julia
>>
>> -- 
>>
> You received this message because you are subscribed to the Google Groups 
>> "go-cd" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/go-cd/8c61c76e-bb7b-474c-bfa5-1f623794fce4n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/go-cd/8c61c76e-bb7b-474c-bfa5-1f623794fce4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/go-cd/c6286df3-b0d0-4772-9f65-7fa95793bd9bn%40googlegroups.com.

Reply via email to