CCNet 1.3 NAnt 0.85 FxCop 1.35 NUnit 2.2.2 NCover 1.58 We've been noticing for quite some time that we suffer from a build failure every once in a while on our dedicated master build server. Our master builds are very large and take up some resources, so we queue the various codelines that need to run similar builds. Just to give context our master builds run everything from FxCop, NUnit, NCover and WebLint. Usually these builds are very stable, but take up to 45 minutes to complete. That's why we have a dedicated server for just this particular type of build, and queue them so no builds are running at the same time.
It appears to be happening at the late stages of the build when the build should actually be close to completion, but it's been known to happen within a regular build target that should be benign to when we actually merge content into the build log (like XML documents), so it doesn't seem to be related to any particular tool, task or anything else in a consistent manner. However it appears to be manifesting itself as a potential NAnt failure in the CCNet log: 2008-10-14 06:12:49,180 [Trunk Master:DEBUG] FxCop Target Path is 'E:\_Build_Trunk_Master\online\buildtools\nant\bin' 2008-10-14 06:15:00,071 [2008.4.0 Master:INFO] Project: '2008.4.0 Master' is added to queue: 'Master' in position 1. 2008-10-14 06:27:25,504 [Trunk Master:DEBUG] 2008-10-14 06:27:25,535 [Trunk Master:DEBUG] Unhandled Exception: System.IndexOutOfRangeException: Probable I/O race condition detected while copying memory. The I/O package is not thread safe by default. In multithreaded applications, a stream must be accessed in a thread-safe way, such as a thread-safe wrapper returned by TextReader's or TextWriter's Synchronized methods. This also applies to classes like StreamWriter and StreamReader. 2008-10-14 06:27:25,535 [Trunk Master:DEBUG] at System.Buffer.InternalBlockCopy(Array src, Int32 srcOffset, Array dst, Int32 dstOffset, Int32 count) 2008-10-14 06:27:25,535 [Trunk Master:DEBUG] at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count) 2008-10-14 06:27:25,535 [Trunk Master:DEBUG] at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder) 2008-10-14 06:27:25,535 [Trunk Master:DEBUG] at System.IO.StreamWriter.Flush() 2008-10-14 06:27:25,535 [Trunk Master:DEBUG] at System.IO.TextWriter.SyncTextWriter.Flush() 2008-10-14 06:27:25,535 [Trunk Master:DEBUG] at NAnt.Core.Tasks.ExternalProgramBase.StreamReaderThread_Output() in E:\_altDepot\trunk\tools\nant\src\NAnt.Core\Tasks\ExternalProgramBase.cs:line 461 2008-10-14 06:27:25,535 [Trunk Master:DEBUG] at System.Threading.ThreadHelper.ThreadStart_Context(Object state) 2008-10-14 06:27:25,535 [Trunk Master:DEBUG] at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) 2008-10-14 06:27:25,535 [Trunk Master:DEBUG] at System.Threading.ThreadHelper.ThreadStart() 2008-10-14 07:00:00,041 [Trunk OpsDev Master:INFO] Project: 'Trunk OpsDev Master' is added to queue: 'Master' in position 2. 2008-10-14 08:31:45,502 [Trunk Master:WARN] Process timed out: E:\\_Build_Trunk_Master\\online\\buildtools\\nant\\bin\\nant.exe -nologo -logger:NAnt.Core.XmlLogger -D:clean=true -D:target-environment=dit -D:skip-oblix-plugins=false -D:skip-asp=false -D:skip-vb-build=false -D:skip-hermes=false -D:skip-dotnet20=false -D:skip-artifacts=false -D:artifacts-output-dir=E:\\artifacts\\_Build_Trunk_Master -D:fxcop-failonerror=true -D:nunit-output-dir=E:\\nunit-output\\_Build_Trunk_Master -D:ndoc-output-dir=E:\\ndoc-output\\_Build_Trunk_Master -D:skip-fxcop=false -D:fxcop-rule-file=default.FxCop -D:fxcop-output-dir=E:\\fxcop-output\\_Build_Trunk_Master -D:skip-gac-install=false -D:skip-versioning-check=false -D:build-assemblyinfo=true -D:report-possibly-unused-props=true -D:skip-releasenote=false -D:releasenote-output-dir=E:\\release-notes\\trunk\\ -D:releasenote-src-dir=E:\\_Build_Trunk_Master\\online\\release-notes\\ -D:CCNetArtifactDirectory="D:\Program Files\CruiseControl.NET\server\Trunk Master\Artifacts" -D:CCNetBuildCondition=IfModificationExists -D:CCNetBuildDate=2008-10-14 -D:CCNetBuildTime=05:45:00 -D:CCNetIntegrationStatus=Unknown -D:CCNetLabel=0.0.0.2519 -D:CCNetLastIntegrationStatus=Success -D:CCNetNumericLabel=2519 -D:CCNetProject="Trunk Master" -D:CCNetProjectUrl=http://10.144.180.211/ccnet/server/Master+Build+Server/project/Trunk+Master/ViewProjectReport.aspx -D:CCNetRequestSource=ScheduleTrigger -D:CCNetWorkingDirectory="D:\Program Files\CruiseControl.NET\server\Trunk Master\WorkingDirectory" build-all-msi. Process id: 4368. This process will now be killed. 2008-10-14 08:31:45,799 [Trunk Master:WARN] The process has been killed: 4368 2008-10-14 08:32:02,123 [Trunk Master:INFO] Task execution failed There's a comple of things I've been noticing when this happens: 1. This particular case the build failed about 75% of the way through the build in the code compile portion. It was started at 5:45am. 2. The time delay between when the last message from the Trunk Master build (a simple debug message from FxCop I've already checked out) and when the exception occurrs is pretty lengthy: about 15 minutes. There aren't any messages related to that build logged in that time. 2. There appears to be an event when the queue is being managed (the next build is checked for queue and priority) that happens just before this exception takes place during this 15 minutes. It happens to be in the same queue and the same priority as the build currently running. 3. Once the exception happens the build essentially "dies" - that is to say it stays in a defunct state until it times out - roughly 33 minutes later after the exception happens. This obviously results in a failed build. Due to the variance in when this happens sometimes the build log is written and sometimes not. it all depends on when the exception took place. In this particular case it happened relatively near the end of project compilation so the build log was missing. In other cases all of the code was built and we were actually building MSIs in a second part of the build. Other times all of this is done and we were merging content into the build log when the failure happened. In all cases it's impossible to interpret what happened via the web dashboard. Has anyone else seen this??? We've been chasing this one around for a while, and have gotten used to seeing it so much we're forced to just quantify when it happens in the build, and what the build log data looks like when it dies, but nothing else. We have experienced threading issues with NAnt 0.85 in the past, and have been forced to address them using Reflector. Any similar experiences would be greatly appreciated. Thanks, Corey
