All:

As I committed to in a prior thread (
http://groups.google.com/group/nhibernate-development/browse_thread/thread/729819b625001217
)
I have now completed a preliminary review of the two possible approaches to
auto-generating API docs for NHibernate from the XML code comments as part
of our build process.

Recall that one suggestion (offered by Will) was to investigate James
Gregory's new alpha build of 'Docu' ( http://docu.jagregory.com/ ), the
light-weight code-comment-compiler he wrote for generating help content for
the Fluent NHibernate project and offered as OSS to anyone wanting to use it
for any other project.  I countered that we should also consider the more
robust DocProject ( http://www.codeplex.com/DocProject ) app that shields
the developer from having to interact with the incredible complexity that is
the SandCastle + MSHelp compiler infrastructure from MS.  What follows are
the results of my doing just that.

Test platform:
Dell D830 laptop, Intel Core 2 Duo, 32-bit WinXP/SP3, 4GB RAM
Visual Studio Pro 2008 SP1
NHibernate 1.2.1 GA release (binaries and XML comment files, no source
needed)

Test platform notes: I used the 1.2.1 GA release of NH just because its what
I happened to grab off my hard drive at first; I have no reason to believe
that the results of any of my tests would be materially affected by running
them on any subsequent build/release/version of NH so I don't think this to
be an impact on the tests or their results.

***Docu Testing and Observations***

Docu works by simply firing off a command-line and passing it the path to
your binary (nhibernate.dll in this case).  It then constructs pure HTML
output that can be loaded/viewed in a browser without needing to be hosted
on a webserver (though the content could of course be posted to a web server
for others to view as desired).

>From the get-go, I had a number of issues (unhandled null-reference
exceptions) thrown by the Docu EXE iteself when operating on the
NHibernate.dll and its XML code comments.  I eventually grabbed the latest
code from Docu's hosted location on GitHub and built it myself in VS.  The
latest code had the same unhandled exceptions but at least with the code in
hand I could troubleshoot the issue(s) myself :D

My 'fixes' to Docu probably aren't worth committing back to that
project since most of them basically checked for null instances of variables
at critical points and return from the methods if nulls are passed to them
(probably NOT the desired behavior, but certainly enough for me to get Docu
to successfully produce output from the nhibernate.dll assembly without
throwing exceptions).  I have no idea if these exceptions are due to
any strange (unexpected?) syntax we are using in the code comments within
the NHibernate codebase or are simply the result of Docu being in
early-alpha and not properly handling otherwise legitimate code comment
syntaxes, but none-the-less we need to be aware that as it exists RIGHT NOW
TODAY, Docu and the NHibernate project's XML code comments are fundamentally
incompatible with each other without there being changes to at least one or
the other :(

Once I tweaked the Docu souce code to successfully run against
the NHibernate.dll without throwing null-reference-exceptions, I was able to
produce the API refererence docs that I have posted on my server for
download by anyone interested at the following URL:
http://unhandled-exceptions.com/downloads/NHibernate_121_Docu_Test.zip.  The
good news is that this documentation is light-weight (pure HTML) and the ZIP
file is barely 2MB in size for the entire help collection.  To view the
results, unzip it somewhere and just click on the index.htm to load the
'site' in your browser of choice.

The generation of these comments by Docu is *NOT* speedy; Docu took
approximately 15+ minutes to generate the output, most of that was spent
with my dual-core processor locked @ 50% utilization with near-zero disk
activity, suggesting that Docu is processor-bound in its performance and
expects and uses just a single core to do its work (suggesting that throwing
more hardware at it isn't likley to help much unless/until Docu becomes
multi-threaded).  Significant disk activity only occured briefly at the end
of the 15 minutes when the final output was rendered to the files included
in the aforementioned ZIP file, suggesting that the compilation isn't disk
I/O-bound at all.

This suggests that even though running Docu is as simple as passing a
single-argument command-line to it, its largely infeasible to invoke it as
part of *every* build sequence while someone was working on the NH codebase
since the post-build documentation compilation step would take a
prohibitively long time.  It might be reasonable to setup Docu to run
remotely on some dedicated CI build-server (e.g., the codebetter teamcity
installation, etc.) so that it happened post-checkin, but due to the long
runtime for the doc-compilation process, its nearly certain that this
process would always have to run out-of-band as a developer worked on the
project and made their check-ins.

Since all that's needed is a single command-line invocation, integrating
Docu into the CI server's build process would be trivial and Docu's lack of
dependency on any other infrastructure (e.g., SandCastle, help compilers,
etc.) makes it trivial for anyone to run the thing themselves were they to
check out the source to their own PC (although as mentioned, they would have
to wait the 15+ minutes for the process to run to completion were they to
invoke Docu against the project).



***DocProject Testing and Observations***

DocProject is a significantly more complex and signficantly more
feature-rich XML code-compilation solution than Docu.  This is both a
positive (better, more useful API reference compilation) and a negative
(significant complexity and dependencies on other tools, etc.).

DocProject works by automating the MS SandCastle infrastructure and,
optionally, the MS Help compiler v 1.x and/or v 2.x to produce its output.
As such, these dependencies have to be present (and properly installed) in
order for DocProject to funciton properly.  The good news is that once this
is accomplished, DocProject is capable of producing compiled help as a
single .CHM file, a .HxS visual-studio-integrated help file that can be
installed right into the VS help subsystem and accessible via F1 from within
Visual Studio, and a complete ASP.NET web site that can be deployed to a
server for wider access to the content.  As DocProject installs and is
controlled as a new 'project type' in Visual Studio, you select as part of
the New-Project-Wizard in Visual Studio which of these output targets you
are interested in compiling to.

I had little trouble getting DocProject up and running on my system; after
installation of the SandCastle infrastructure and the requisite MS help
compilers, the DocProject installer capably interrogates the registry to
discover the paths to these items and wires itself up to them just fine.
Once installed, I need not interact with the underlying components at all
and can control/confgure the behavior of the compiled help output solely
from with Visual Studio by editing the DocProject settings for the custom VS
project type.  This makes for a familiar UI (build property pages, etc.) for
configuring the output of the system.

Performance of the DocProject system in generating the help output was no
better (or worse!) than that of Docu, taking about the same 15+ minutes to
produce its output.  This suggests that performance/speed isn't a factor in
determining which of these directions to pursue.  There doesn't seem to be a
significant change in the compilation time based on what output targets you
select (e.g., CHM, ASP.NET web site, etc.) so I am strongly guessing that
the vast bulk of the 15+ minutes is spent in processing the comments rather
than spitting them out to actual help artifacts.  Since this is about the
same 15+ minutes that Docu took, I'm going to conclude that there is little
that could be done to reduce this processing time significantly.

The results of my running the nhibernate.dll and its related comments
through the DocProject process are posted for download by anyone interested
at the following URL:
http://unhandled-exceptions.com/downloads/NHibernate_121_DocProject_Test.zip.
 Becuase the DocProject output is a complete
ASP.NET web site including graphics, icons, etc. instead of just standard
HTML and because this download also contains the complete CHM file, this
download is over 90+ MB in size.  Since its an ASP.NET web site, to view
this content you will need to unzip it somewhere and then point an IIS
virtual directory to it in order to view/consume it.  Once you do this, the
website also contains a link (in the upper right) which leads to the
compiled 15 MB .chm file if you are interested in seeing that content as
well (but it looks almost 100% identical to the ASP.NET content, so not much
need for that).

DocProject is (ultimately) invoked from MSBUILD, and so it would be possible
to wire it up as well as a post-build event or a CI task that automatically
happened out-of-band when code is checked into the repository (just as with
Docu) but since its MSBUILD this task-integration would probably be more
complex than the simpler command-line invocation that Docu provides.  Also,
since DocProject is dependent on Sandcastle and the MS help compilers to do
its work, these dependencies would need to be installed/configured on
whatever CI platform invoked the API Reference compilation step of course.




***SUMMARY OF COMPARISON***

Docu
---------
Pros:

   - simple to configure/invoke
   - no external dependencies on other tools
   - light-wt output (small output size 2MB+/-)
   - final output can be viewed in browser w/out a web server (e.g., just
   HTML files)
   - web output can be posted to a non-IIS/ASP.NET web server for public
   access

Cons:

   - early alpha tool
   - presently throws exceptions and crashes when pointed at the NH project
   :(
   - no search capability in the output (beyond CTRL+F on page-by-page
   basis); intended usage pattern seems to be BROWSE, not SEARCH
   - no single-file output target (e.g., CHM)
   - no integration of output with Visual Studio Help system
   - takes 15+ minutes to run


DocProject
----------------
Pros:

   - output looks/feels like rest of Microsoft (MSDN) help and offers
   familiar navigation of content
   - offers single-file output target (CHM)
   - output is searchable in its entirety at once (vs. page-at-a-time)
   - index automatically built and integrated into output
   - Visual Studio integrated help can be an output target
   - configuration is performed in a familiar environment (Visual Studio)

Cons:

   - external dependency on MS tools (sandcastle, help compilers, etc.)
   - significantly larger website output (90+ MB)
   - web content needs IIS/ASP.NET to host it for public access
   - more complex process of integrating it into build scripts
   - takes 15+ mnutes to run


***RECOMMENDATION***

IMO the DocProject approach is the more robust of the two options, offering
a more familiar presentation of content to the end-user and richer
experience in interacting with the content (e.g., integrated seach, indexed
keywords, etc.).  If we are going to bother to do this, I think it would be
most valuable to do it in a way that the resulting content is the most
approachable and the most usable by as many people as possible and IMO
that's the output provided by the DocProject approach.  It offers the
web-based content that should be posted to the internet as well as the CHM
file for those wanting offline reference to the content.  For the
adventuresome, there is even the VS-integrated content making the NHibernate
API reference a full-fledged participant in the VS help system (supporting
valuable learning scenarios such as placing your cursor on an NH
class/method and being able to jump to help on it via a simple F1 keystroke
from inside Visual Studio -- followed, sadly, by the interminable 10-minute
wait for the VS help system to spool up and load, of course!).

The biggest challenge to the DocProject approach IMO is the dependency on
SandCastle, the MS help compilers, etc. and if the DocProject help generator
VS project were added directly to the NHibernate trunk solution, then anyone
interested in building NH would need to either unload the DocProject VS
project from the solution or else get all of those dependencies installed
just to build/compile NH and that's too high a burden to ask anyone to
achieve if all they want is to check out the core NH project and
build/compile it for themselves IMO.

One of the important things to understand about *either* of these help
compilation tools is that neither of them actually require access to *any*
of the NH source code directly -- instead they simply require access to the
compiled binaries and the XML code-comment files extracted from the source
code by the C# compiler at build-time.  This actually means that I think the
best way to accomplish the creation of a rich API reference for NH is to
create a separate parallel solution (NH_API_Reference?) that is *not* part
of the main NH solution but contains (relative) path-pointers to the
location of the compiled NH binaries from the actual NH solution itself.
This way, the 'API Reference Project' can be completely separate and
distinct from the actual NH source trunk.

This would support the following scenarios:

1) if you want to build just NH, you get that trunk and build it; the sln,
nant scripts, etc. make no refernece to the DocProject stuff at all and
nobody is affected (nobody needs SandCastle, MS Help compilers, etc. to
build the NH trunk just as is the case today)

2) if you want to build the API ref docs, you check out BOTH the NH trunk
and the API_REF trunk, build the NH trunk, and then build the API_REF trunk
that points to the bin output folder from the NH source trunk to get the
binaries and the XML it needs to process; this scenario would (of course)
require you to have installed SandCastle, the MS Help compilers, etc. in
order to perform the compilation of the API reference docs but only such
people would be affected

It seems to me that this would support the needs of everyone in a way that
would have the least negative impact on the 'real' NH source trunk and yet
still permit us to construct the most robust API reference content for any
NH adopter.

Sorry to all for the (ridiculous) length of this thing, but as this is
hardly the kind of decision I think I should (could!) make on my own, I
wanted to try to summarize as much of my findings as I could so that
everyone can understand the factors that will play into our decision and
help form the basis for any discussion anyone wants to have about how best
to proceed.

Thoughts (as always) welcome; I'm sure I'm overlooking several pros and cons
for either solution so am hoping a discussion here about this will surface
some of my oversights.

-- 
Steve Bohlen
[email protected]
http://blog.unhandled-exceptions.com
http://twitter.com/sbohlen

Reply via email to