CCNet 1.3
NAnt 0.85
FxCop 1.35
NUnit 2.2.2
NCover 1.58

We've been noticing for quite some time that we suffer from a build
failure every once in a while on our dedicated master build server.
Our master builds are very large and take up some resources, so we
queue the various codelines that need to run similar builds.  Just to
give context our master builds run everything from FxCop, NUnit,
NCover and WebLint.  Usually these builds are very stable, but take up
to 45 minutes to complete.  That's why we have a dedicated server for
just this particular type of build, and queue them so no builds are
running at the same time.

It appears to be happening at the late stages of the build when the
build should actually be close to completion, but it's been known to
happen within a regular build target that should be benign to when we
actually merge content into the build log (like XML documents), so it
doesn't seem to be related to any particular tool, task or anything
else in a consistent manner.  However it appears to be manifesting
itself as a potential NAnt failure in the CCNet log:

2008-10-14 06:12:49,180 [Trunk Master:DEBUG] FxCop Target Path is
'E:\_Build_Trunk_Master\online\buildtools\nant\bin'
2008-10-14 06:15:00,071 [2008.4.0 Master:INFO] Project: '2008.4.0
Master' is added to queue: 'Master' in position 1.
2008-10-14 06:27:25,504 [Trunk Master:DEBUG]
2008-10-14 06:27:25,535 [Trunk Master:DEBUG] Unhandled Exception:
System.IndexOutOfRangeException: Probable I/O race condition detected
while copying memory. The I/O package is not thread safe by default.
In multithreaded applications, a stream must be accessed in a
thread-safe way, such as a thread-safe wrapper returned by
TextReader's or TextWriter's Synchronized methods. This also applies
to classes like StreamWriter and StreamReader.
2008-10-14 06:27:25,535 [Trunk Master:DEBUG]    at
System.Buffer.InternalBlockCopy(Array src, Int32 srcOffset, Array dst,
Int32 dstOffset, Int32 count)
2008-10-14 06:27:25,535 [Trunk Master:DEBUG]    at
System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
2008-10-14 06:27:25,535 [Trunk Master:DEBUG]    at
System.IO.StreamWriter.Flush(Boolean flushStream, Boolean
flushEncoder)
2008-10-14 06:27:25,535 [Trunk Master:DEBUG]    at
System.IO.StreamWriter.Flush()
2008-10-14 06:27:25,535 [Trunk Master:DEBUG]    at
System.IO.TextWriter.SyncTextWriter.Flush()
2008-10-14 06:27:25,535 [Trunk Master:DEBUG]    at
NAnt.Core.Tasks.ExternalProgramBase.StreamReaderThread_Output() in
E:\_altDepot\trunk\tools\nant\src\NAnt.Core\Tasks\ExternalProgramBase.cs:line
461
2008-10-14 06:27:25,535 [Trunk Master:DEBUG]    at
System.Threading.ThreadHelper.ThreadStart_Context(Object state)
2008-10-14 06:27:25,535 [Trunk Master:DEBUG]    at
System.Threading.ExecutionContext.Run(ExecutionContext
executionContext, ContextCallback callback, Object state)
2008-10-14 06:27:25,535 [Trunk Master:DEBUG]    at
System.Threading.ThreadHelper.ThreadStart()
2008-10-14 07:00:00,041 [Trunk OpsDev Master:INFO] Project: 'Trunk
OpsDev Master' is added to queue: 'Master' in position 2.
2008-10-14 08:31:45,502 [Trunk Master:WARN] Process timed out:
E:\\_Build_Trunk_Master\\online\\buildtools\\nant\\bin\\nant.exe
-nologo -logger:NAnt.Core.XmlLogger -D:clean=true
-D:target-environment=dit -D:skip-oblix-plugins=false
-D:skip-asp=false -D:skip-vb-build=false -D:skip-hermes=false
-D:skip-dotnet20=false -D:skip-artifacts=false
-D:artifacts-output-dir=E:\\artifacts\\_Build_Trunk_Master
-D:fxcop-failonerror=true
-D:nunit-output-dir=E:\\nunit-output\\_Build_Trunk_Master
-D:ndoc-output-dir=E:\\ndoc-output\\_Build_Trunk_Master
-D:skip-fxcop=false -D:fxcop-rule-file=default.FxCop
-D:fxcop-output-dir=E:\\fxcop-output\\_Build_Trunk_Master
-D:skip-gac-install=false -D:skip-versioning-check=false
-D:build-assemblyinfo=true -D:report-possibly-unused-props=true
-D:skip-releasenote=false
-D:releasenote-output-dir=E:\\release-notes\\trunk\\
-D:releasenote-src-dir=E:\\_Build_Trunk_Master\\online\\release-notes\\
-D:CCNetArtifactDirectory="D:\Program
Files\CruiseControl.NET\server\Trunk Master\Artifacts"
-D:CCNetBuildCondition=IfModificationExists
-D:CCNetBuildDate=2008-10-14 -D:CCNetBuildTime=05:45:00
-D:CCNetIntegrationStatus=Unknown -D:CCNetLabel=0.0.0.2519
-D:CCNetLastIntegrationStatus=Success -D:CCNetNumericLabel=2519
-D:CCNetProject="Trunk Master"
-D:CCNetProjectUrl=http://10.144.180.211/ccnet/server/Master+Build+Server/project/Trunk+Master/ViewProjectReport.aspx
-D:CCNetRequestSource=ScheduleTrigger
-D:CCNetWorkingDirectory="D:\Program
Files\CruiseControl.NET\server\Trunk Master\WorkingDirectory"
build-all-msi.  Process id: 4368.  This process will now be killed.
2008-10-14 08:31:45,799 [Trunk Master:WARN] The process has been killed: 4368
2008-10-14 08:32:02,123 [Trunk Master:INFO] Task execution failed

There's a comple of things I've been noticing when this happens:
1.  This particular case the build failed about 75% of the way through
the build in the code compile portion.  It was started at 5:45am.
2.  The time delay between when the last message from the Trunk Master
build (a simple debug message from FxCop I've already checked out) and
when the exception occurrs is pretty lengthy: about 15 minutes.  There
aren't any messages related to that build logged in that time.
2.  There appears to be an event when the queue is being managed (the
next build is checked for queue and priority) that happens just before
this exception takes place during this 15 minutes.  It happens to be
in the same queue and the same priority as the build currently
running.
3.  Once the exception happens the build essentially "dies" - that is
to say it stays in a defunct state until it times out - roughly 33
minutes later after the exception happens.  This obviously results in
a failed build.

Due to the variance in when this happens sometimes the build log is
written and sometimes not.  it all depends on when the exception took
place.  In this particular case it happened relatively near the end of
project compilation so the build log was missing.  In other cases all
of the code was built and we were actually building MSIs in a second
part of the build.  Other times all of this is done and we were
merging content into the build log when the failure happened.  In all
cases it's impossible to interpret what happened via the web
dashboard.

Has anyone else seen this???  We've been chasing this one around for a
while, and have gotten used to seeing it so much we're forced to just
quantify when it happens in the build, and what the build log data
looks like when it dies, but nothing else.  We have experienced
threading issues with NAnt 0.85 in the past, and have been forced to
address them using Reflector.  Any similar experiences would be
greatly appreciated.

Thanks,

Corey

Reply via email to