As some of you are aware, the Apache Infrastructure team has mandated that 
all projects move to the new svnpubsub process for publishing their websites 
by the end of the year.   Camel (as are all Confluence based sites) is 
affected by this mandate.   We currently use a multi-step rsync process 
where Confluence exports the space to HTML, that gets rsynced to an area on 
people.apache.org once an hour.   A cron process in someone's crontab on 
people.apache.org then runs to rsync it to the appropriate place 
(/www/camel.apache.org) once an hour or so as well.   Then, another rsync 
will sync from there to the live sites.    This causes a lot of delays in 
publishing (can be a few hours between change and live), but also involves a 
LOT of disk IO to sync things all over the place.   The svnpubsub process is 
a lot faster as the site changes are committed to svn and anything that 
needs it (the live site) can listen for the changes and update immediately.

Anyway, over the last couple of months, a bit of work has been done with 
various projects to start helping projects transition to svnpubsub.  Joe 
Schaefer has been working with the Maven folks so the "mvn site:deploy" 
stuff can deploy via svnpubsub.   Obviously the CMS uses it heavily. 

We also now have a "solution" for Confluence based sites based on work I've 
done for CXF's site.    Confluence has a SOAP interface for retrieving 
information and rendering pages.    Well, if I see a SOAP interface (even a 
crappy one like Confluence's)......  ;-)

Seriously, I have a program now that can render an entire confluence space 
by using the SOAP API's and Velocity (which is what the current AutoExport 
stuff uses, so migration is easy).   However, it does more than that by also 
recording the modified times, checking the RSS feed for changes first, 
tracking {children} and {include} tags, etc...   Thus, if you change a page 
that is "included" in another (think about the "Book in One Page" page), 
those pages will also get re-rendered.    If you add/delete a page, any page 
that uses the {children} tag to generate a tableof contents will 
automatically re-render.   

It ALSO cleans up the resulting HTML via tagsoup and some custom cleanup 
code.  The Confluence generated HTML is aweful with invalid attributes, bad 
links, etc...   They are now "mostly" cleaned up.

I've uploaded a "build" of the site to:
http://people.apache.org/~dkulp/camel/
so you can see that the result is pretty much identical to the live site.   
A couple of things are actually better such as the image links for the blog 
entries.   Also, the new page actually validates with the w3c validator:

http://validator.w3.org/check?uri=http%3A%2F%2Fpeople.apache.org%2F~dkulp%2Fcamel%2F


To run this, a buildbot build will be setup to run the process once an hour 
to generate new html if it detects a change (rss feed).  Once run, you get a 
commit message to the commits list and the changes are live immedately.  
Thus, changes are now "at most" one hour till they are on the live site. 
However, any commiter can checkout the stuff and run it manually if they 
need/want things live immiately.


For CXF, this new process is now "live" (since Monday).   I've filed a 
ticket with INFRA to start the process for Camel. (requires a content area 
in the web svn repo for the live content, then a buildbot build, then some 
configs to make it all live)    It's definitely still a "work in progress", 
but it's a good start.   For example, it doesn't track the blog/news entries 
so currently if you add a blog (for a release), you would need to manually 
trigger an Index page update.   However, the code is there so it's something 
we can add/enhance.  I also want to update it to render pages in parallel if 
possible to make it a bit quicker.

For camel, the main "pom" and scripts are in:
http://svn.apache.org/repos/asf/camel/website/
and there is a README there.   The code for the stuff is grabbed via an 
svn:externals to the area in CXF so I just need to update the code in one 
place:
http://svn.apache.org/repos/asf/cxf/web/
I want to avoid "forking" the code as I *DO* know that if/when they move to 
Confluence 4.x (currently on 3.4.x), it will need some updating as the SOAP 
API's changes a bit.


Anyway, I'm hoping to have Camel flipped over by the end of the week or 
early next week.   

BTW:  if you are interested, it has to render 644 pages for the full Camel 
website.   Takes about 15 minutes to do right now, but like I said, once 
it's all setup, it can do incremental updates which is MUCH quicker.  644 
pages is quite a bit.


-- 
Daniel Kulp
dk...@apache.org - http://dankulp.com/blog
Talend Community Coder - http://coders.talend.com

Reply via email to