The next chapter in this saga... (apologies for the long post). If you
won't be writing docbooks or your docbooks won't be cross-referenced to
any other docbooks in the uima bookshelf, then you can skip reading
this, unless you want to be entertained :-) .

This is all about olinks. Olinks allow cross-referencing and
hyperlinking among documents, using extra saved information about the
target document being linked to (as contrasted with plain href style
links, which only have the link url). For instance, in PDFs, there's
extra info enabling the referring doc to say "page 123 in document abc".
For PDF and HTML, it allows the referring text to include a hyperlink
with the text begin the target document's title, and maybe number (if it
has numbered items - such as our chapter / section numbers in the main
UIMA documentation). So you can get a link that looks like this:

see Section 1.5.1, “Annotator Methods”
<http://uima.apache.org/downloads/releaseDocs/2.3.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.contract_for_annotator_methods>
for ...

where the 1.5.1 was generated by docbook processing, and the "Annotator
Methods" was the title of that section.

To make olinks work, each time a docbook is processed, an extra database
of info for that docbook is created, containing just the info needed for
this. This database, together with some other data about how the
multiple interlinking docbooks are arranged, is needed when processing a
docbook, to resolve these things.

So - where to store this information? We previously had stored this in
SVN. This was unsatisfactory because it caused interdependencies among
checked-out projects, where one project (having these databases) had to
be checked out into a specific, fixed directory layout with respect to
other (using) projects. The Maven way to get around this is to put these
things into the maven repository.

Since there's one database per docbook, I though it could best be stored
as an additional maven attached file for the project. Then you could
"depend" on the project, and download that artifact. This would place a
burden on docbook users - they would need to specify additional POM info
to get these things downloaded.

So I tried that, and it worked fine for individual book processing. Then
I tried using an aggregator POM specifying the 4 main UIMA docbooks (now
moved to separate projects), and since these all refer (that is, olink )
to each other, this violated a maven principle of no circular dependency
relationships. These really are circular relationships, but they resolve
when you run docbook multiple times :-) .

To fix this, I went to a scheme where there is just one additional
project (I'm calling it uima-docbook-olink-dbs) that will have just one
attached artifact, a zip file of all the needed docbook olink data for
all the docbooks in UIMA. (This could be a large set - besides the 4
main books, we have one for uima-as, and there's other books for many of
the sandbox projects, and one for some special tooling - like the
PearPackagingMavenPlugin).

This project is at the level 1-SNAPSHOT, and I think it will stay there.
This is because it's always being updated in part by each docbook
processing run, and we currently don't have a concept of needing any but
the latest versions of things. Note that releases will capture the
result of using the then current (at the time of the build) version of
these databases. I could imagine some fancy use cases that might not be
well supported - such as working on several versions at once, but I'll
let those use cases materialize first before trying to address them :-)
. Here's how this set of olink data will be used.

1) new users start by checking out and running a build which invokes our
docbook processing. This uses the dependency:unpack goal to find this
artifact in the maven snapshot repo (in the Apache infrastructure's
version of Nexus), where it lives - it will have the latest "deployed"
(that is, uploaded) set of olink data, for all docbooks that are using
olink.

2) The dependency:unpack will first download this zip to the local repo
if it isn't already there. If it is already there, it will check to see
if the snapshot in the repository is newer, and if so, will download
that. It then unpacks that to a spot where all projects being built on
this workstation for this user can find it.

3) The rest of the docbook build uses this olink data, and also, as a
side-effect of running on a particular document, adds or updates the
existing olink data for the current document being processed.

In thinking about where to store the unzipped form of the olink
databases, I hit upon the idea of storing it in the local .m2 repo, in
the uima-docbook-olink-dbs project, but as an additional directory
(called docbook-olink) which is *not* attached - so it won't be uploaded.

This has a couple of nice side effects - once installed and unzipped,
unless someone else "deploys" an update of this data to the snapshot
repo, the download and unpack steps can be somewhat skipped. And,
whenever someone doing some docbook builds is happy with their results,
they've as a side effect been creating additional or updated olink info
for one or more books, and to make these available to others, they just
need to "deploy" these back to the snapshot repository. (Note that that
deploy step runs a POM which first gets any updates made by someone else
for other docbooks, that might have happened in the meanwhile, so what's
uploaded is the latest version of all docbooks (except for collisions
where two have checked out the and are processing the very same docbook
- in which case the last one wins...).

Testing revealed that this seems to work, with one exception - when I
ran the deploy from within m2eclipse, it nicely uploaded the POM , but
gave a message about Failed to Upload [400]. After much googling that
didn't identify the issue, I tried this from the command line, and it
worked. A few more tries isolated this to an apparent issue in the
"built-in" version of maven that m2eclipse 0.0.10 uses, which is a
version of maven 3.0-alpha-6. I found that maven 2.2.1 and 3.0-beta-1
both work, even when run from m2eclipse. So if you are using m2eclipse,
I recommend you
1) use the maven preferences to install a link to 3.0-beta-1, and set it
as the default
2) if previous use of m2eclipse created any run configurations, you have
to manually update each one of those - there's a menu pull-down at the
bottom of the main run configuration page for each one, labeled "Maven
Runtime", where you can switch this.

Next steps will be verifying that the overwrite-if-newer is working for
using dependency:unpack for individual unpacked files, then I'll
probably go and check a bunch of this in :-)

I've started writing a new web page for our site describing how to do
docbooks, the uima bookshelf concept, etc., which I'll need to update...

-Marshall




On 4/26/2010 11:17 PM, Marshall Schor wrote:
> Docbook story:  Most of the afternoon was spent tracking down a bug,
> which turned out to be formerly hidden by Maven 2.2.1, but which Maven
> 3.0 exposed (I'm trying Maven 3.0 beta 1 - it seems to run faster/better
> :-) ).  The symptom was a report that the "catalog file" could not be found.
>
> The bug is that if you ask in a plugin to load a resource at the top
> level, using the string "/xxx.xml" for instance, it fails.  This is
> because that leading "/" makes the Java classloader.getResource(aString)
> fail.  To fix, just drop the leading "/". 
>
> I've reported this along with the fix to the docbkx project - they use
> this to load the "catalog.xml" file that comes with docbook 4.x and 5.0
> distributions. 
>
> So, now, after all that, I'm starting to get docbook building again,
> this time with fully factored parent plugins.  The olink stuff I'm going
> to try to do by using maven "attachments", and going for a strategy of
> only 1 docbook per project (I've split the uima-docbooks project, which
> held 4 docbooks, into 4 projects, each holding one docbook). 
>
> This aligns the approach with the way Sandbox projects are doing
> documentation - they have the project produce the 1 main artifact (a
> jar), and now it will also produce (when I'm fininshed :-) ) an
> additional "attached" artifact - the olink data for the pdf and html
> versions. 
>
> This will allow other docbooks which want to hyperlink to a reference in
> the first docbook to be able to do so. (OLinking is like normal
> hyperlinking, except that information about the target is known, so for
> PDFs, the link includes the "book" + page number in the book, and it
> includes locating the other book via a relative directory path.).
>
> It looks like I'll be able to put all the gorp (that's a technical term
> :-) ) for docbook formatting, like boiler plate, title pages, things to
> enable xInclude, fonts, css stuff,
> customization xsl layers, etc. into a shared "resource bundle" that
> projects will be able to fetch (from their local .m2 repository, or from
> the big repo in the sky).
>
> -Marshall
>
> On 4/22/2010 4:03 PM, Marshall Schor wrote:
>   
>> progress -
>>
>> the uimaj/branches/mavenAlign branch should now build all of the Java
>> components.  There are 2 new aggregate (only) POMs for this, to build in
>> batch, called aggregate-pom-uimaj and aggregate-pom-uimaj-eclipse-plugins.
>>
>> More checking to do to verify the build is ok.
>>
>> Next to tackle: docbooks, then the assemblies.
>>
>> -Marshall
>>
>> On 4/19/2010 5:16 PM, Marshall Schor wrote:
>>   
>>     
>>> Progress - created a common eclipse-plugin parent pom, and got the
>>> ep-configurator eclipse project to build.
>>>
>>> I noticed as a side effect of checking things that our 2.3.0 build for
>>> these artifacts are missing the License, Notice, etc. in the Jar
>>> manifest.  The new structure of parent poms corrects this in a uniform
>>> way :-)
>>>
>>> -Marshall
>>>
>>> On 4/19/2010 10:42 AM, Marshall Schor wrote:
>>>   
>>>     
>>>       
>>>> Progress -
>>>>
>>>> To handle the many Jars that need the extra bit in their Notice file(s),
>>>> I made a version of the remote-resource "bundle" that includes a
>>>> placeholder for additional text following the standard NOTICE boiler plate.
>>>>
>>>> Then I made a version of the parent pom for uimaj (uimaj-ibm-notice)
>>>> which uses this extra remote resource, and sets the additional text to
>>>> the required boilerplate for those jars which were originally coming
>>>> from IBM. 
>>>>
>>>> Now, JVinci has the right notice file...
>>>>
>>>> next problems I'm working on for JVinci: The implementation url is
>>>> incorrect (it's for the parent-pom), and the project title META-INF
>>>> which we used to have, is missing.
>>>>
>>>> -Marshall
>>>>
>>>> On 4/15/2010 5:17 PM, Marshall Schor wrote:
>>>>   
>>>>     
>>>>       
>>>>         
>>>>> Progress -
>>>>>
>>>>> I made a new top-level node in the uima tree called "build" - for
>>>>> artifacts that we won't normally be including in assemblies, but which
>>>>> are instead build things.
>>>>>
>>>>> In there, I put a folder called "parent-poms" - the intent is to keep
>>>>> these organized in one place.
>>>>>
>>>>> I made a top level pom for the whole project, which inherits from the
>>>>> common Apache pom version 7.  The common Apache pom connects the deploy
>>>>> / release process with the Nexus repository.
>>>>>
>>>>> I also made a top level pom for just the main UIMA Java SDK -
>>>>> corresponding sort of to the former uimaj pom, except it doesn't have
>>>>> any aggregation stuff.
>>>>>
>>>>> BTW, in fiddling with the poms, I'm following the recommended ordering
>>>>> for elements in the POM, listed here:
>>>>> http://maven.apache.org/developers/conventions/code.html  (scroll 3/4 of
>>>>> the way toward the bottom)
>>>>>
>>>>> After fiddling with my .m2/settings.xml files per the instructions on
>>>>> migrating to Nexus, both install and deploy worked (deploy was for a
>>>>> SNAPSHOT - no real releases :-) ).
>>>>>
>>>>> You can see the deployed artifacts on repository.apache.org in the
>>>>> Snapshots area.
>>>>>
>>>>> I'm now trying to see how to set up projects whose poms inherit from
>>>>> uimaj.  First trying jVinci.  I'm comparing what gets built to what was
>>>>> built for 2.3.0-incubating.
>>>>> One difference - a bunch of our components have slightly different
>>>>> Notices needed, so I'll fix that.
>>>>>
>>>>> Another thing to fix: thinking about when to run RAT.  Some projects put
>>>>> it into a profile - so you can run it when you want to.  It could also
>>>>> be in the apache-release profile - so it's always run when doing a
>>>>> release candidate.  Unless there's a better idea, I'll add this.
>>>>>
>>>>> -Marshall
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>>   
>>>>     
>>>>       
>>>>         
>>>   
>>>     
>>>       
>>   
>>     
>
>   

Reply via email to