Having worked with Julia on this issue, I can share some findings: We tracked this down to git processes intermittently hanging when fetching material updates on the GoCD server:
- in the go/api/support endpoint, we saw MaterialUpdateListener threads hanging (the issue was affecting more than one pipeline) - ps aux | grep git on the server showed that a few git processes have been running for hours A restart fixes the issue temporarily because there's an in-memory data structure tracking material updates that is only cleared (for a particular material) when its corresponding MaterialUpdateListener thread has completed its run. In our case, the thread kept waiting for the hung processes so the data wasn't cleared. Restarting GoCD server clears this data. We suspected that the hanging git processes was related to some network instability. Per Aravind's suggestion, we wrapped the git binary with strace, and indeed we found that the process keeps waiting on a socket read. Another possible factor is the git version used in the server, which is 1.8.3.1 (the latest available version in the internal repository, but really old). Our agents use the same old version, and we have seen the same hanging issue there as well. A temporary workaround which has worked for us is to wrap the git binary used by GoCD around timeout. We are also looking into upgrading git and understanding the observed network instability. On Monday, May 23, 2016 at 5:57:29 PM UTC-4, [email protected] wrote: > > We are using GoCD version 15.2 with Github Pull Request Builder Plugin > version 1.2.4. > > > An issue we are seeing is that at least once a day, the pipeline that uses > the PR plugin is unable to be triggered despite PRs being created. The only > temporary solution we've found was to restart the Go server. > > In "Pipeline" view, the play buttons are grayed out, while in > "Environment" view, the pipeline is "preparing to schedule". > > Pipeline tab: > > <https://lh3.googleusercontent.com/-_wDL4bMtUg0/V0N5M9e_I_I/AAAAAAAAAfo/7Hmj4y79RRUhwCxNE5o3eaRZYLdAAHtAQCLcB/s1600/Screen%2BShot%2B2016-05-23%2Bat%2B17.41.09.png> > > > <https://lh3.googleusercontent.com/-_wDL4bMtUg0/V0N5M9e_I_I/AAAAAAAAAfo/7Hmj4y79RRUhwCxNE5o3eaRZYLdAAHtAQCLcB/s1600/Screen%2BShot%2B2016-05-23%2Bat%2B17.41.09.png> > Environment tab: > <https://lh3.googleusercontent.com/-_wDL4bMtUg0/V0N5M9e_I_I/AAAAAAAAAfo/7Hmj4y79RRUhwCxNE5o3eaRZYLdAAHtAQCLcB/s1600/Screen%2BShot%2B2016-05-23%2Bat%2B17.41.09.png> > > <https://lh3.googleusercontent.com/-qEzXHWTy8BQ/V0N5Q3-OH3I/AAAAAAAAAfs/ixIx1SPv_IAquaVlCgzz3BUMtCIBDvfgACLcB/s1600/Screen%2BShot%2B2016-05-23%2Bat%2B17.41.04.png> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Here is what we see in the plugin logs: > 2016-05-23 11:03:14,752 WARN [92@MessageListener for > MaterialUpdateListener] GitHubPRBuildPlugin:67 - get latest revisions since: > java.lang.RuntimeException: Exception Occurred: [git, fetch, origin, > +refs/pull/*/head:refs/remotes/origin/pull-request/*] - > /home/go-server/pipelines/flyweight/2d857165-df0a-4f33-b466-0b62b4d54b68 > at com.tw.go.plugin.cmd.Console.runOrBomb(Console.java:33) > at > com.tw.go.plugin.git.GitCmdHelper.runAndGetOutput(GitCmdHelper.java:397) > at > com.tw.go.plugin.git.GitCmdHelper.runOrBomb(GitCmdHelper.java:385) > at com.tw.go.plugin.git.GitCmdHelper.fetch(GitCmdHelper.java:195) > at com.tw.go.plugin.GitHelper.fetchAndReset(GitHelper.java:109) > at > com.tw.go.plugin.GitHelper.fetchAndResetToHead(GitHelper.java:99) > at com.tw.go.plugin.GitHelper.cloneOrFetch(GitHelper.java:41) > at > in.ashwanthkumar.gocd.github.GitHubPRBuildPlugin.handleLatestRevisionSince(GitHubPRBuildPlugin.java:190) > at > in.ashwanthkumar.gocd.github.GitHubPRBuildPlugin.handle(GitHubPRBuildPlugin.java:88) > at > com.thoughtworks.go.plugin.infra.DefaultPluginManager$1.execute(DefaultPluginManager.java:172) > at > com.thoughtworks.go.plugin.infra.DefaultPluginManager$1.execute(DefaultPluginManager.java:167) > at > com.thoughtworks.go.plugin.infra.FelixGoPluginOSGiFramework.executeActionOnTheService(FelixGoPluginOSGiFramework.java:315) > at > com.thoughtworks.go.plugin.infra.FelixGoPluginOSGiFramework.doOn(FelixGoPluginOSGiFramework.java:245) > at > com.thoughtworks.go.plugin.infra.DefaultPluginManager.submitTo(DefaultPluginManager.java:167) > at > com.thoughtworks.go.plugin.access.PluginRequestHelper.submitRequest(PluginRequestHelper.java:32) > at > com.thoughtworks.go.plugin.access.scm.SCMExtension.latestModificationSince(SCMExtension.java:156) > at > com.thoughtworks.go.server.service.materials.PluggableSCMMaterialPoller.modificationsSince(PluggableSCMMaterialPoller.java:77) > at > com.thoughtworks.go.server.service.materials.PluggableSCMMaterialPoller.modificationsSince(PluggableSCMMaterialPoller.java:46) > at > com.thoughtworks.go.server.service.MaterialService.modificationsSince(MaterialService.java:110) > at > com.thoughtworks.go.server.materials.ScmMaterialUpdater.insertLatestOrNewModifications(ScmMaterialUpdater.java:52) > at > com.thoughtworks.go.server.materials.PluggableSCMMaterialUpdater.insertLatestOrNewModifications(PluggableSCMMaterialUpdater.java:61) > at > com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.insertLatestOrNewModifications(MaterialDatabaseUpdater.java:155) > at > com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.updateMaterialWithNewRevisions(MaterialDatabaseUpdater.java:147) > at > com.thoughtworks.go.server.materials.MaterialDatabaseUpdater$2.doInTransaction(MaterialDatabaseUpdater.java:110) > at > com.thoughtworks.go.server.transaction.TransactionCallback.doWithExceptionHandling(TransactionCallback.java:24) > at > com.thoughtworks.go.server.transaction.TransactionTemplate$3.doInTransaction(TransactionTemplate.java:53) > at > org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130) > at > com.thoughtworks.go.server.transaction.TransactionTemplate.executeWithExceptionHandling(TransactionTemplate.java:49) > at > com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.updateMaterial(MaterialDatabaseUpdater.java:107) > at > com.thoughtworks.go.server.materials.MaterialUpdateListener.onMessage(MaterialUpdateListener.java:48) > at > com.thoughtworks.go.server.materials.MaterialUpdateListener.onMessage(MaterialUpdateListener.java:29) > at > com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:65) > at > com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:50) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.commons.exec.ExecuteException: Process exited with > an error: 128 (Exit value: 128) > at > org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404) > at > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166) > at > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153) > at com.tw.go.plugin.cmd.Console.runOrBomb(Console.java:25) > ... 33 more > > Has anyone come across this issue? > > We're also curious about why the restart fixes the issue (although only > temporarily), given that the error was happening when a git command was > executed on the flyweight directory. Any pointers on this perhaps? > -- You received this message because you are subscribed to the Google Groups "go-cd" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
