Re: [documentation-dev] User guide

marbux Tue, 31 Jul 2007 09:05:01 -0700

On 7/31/07, Frank Peters <[EMAIL PROTECTED]> wrote:
>
> If we want to move to the wiki as the main collaborative
> platform we need mediawiki->ODF and mediawiki->PDF conversion
> processes.



I can't say the required extension to Mediawiki doesn't exist, but I went
through all of the Media Wiki extensions about 6 months back. and while
there is at least one for writing to PDF on a per page basis, you probably
would not be happy with its output. But you might look the extensions over
to see if what you need exists, starting here. <
http://www.mediawiki.org/wiki/MediaWiki_extensions>.

The problem is that wikis are really designed to produce variable length
cross-linked web pages, not to produce a dead-tree book-like document with
niceties like pages of equal length and page numbering. I suspect the *best*
you'll be able to do is to somehow cocatenate separate wiki pages into a
single wiki page page, fixing absolute URL artifacts, introducing a
conversion to ODT at an appropriate point in the process, change absolute
URLs to relative URIs, then doing lots of manual link checking to make sure
all your links have been properly preserved/restored. Once those kinds of
problems are overcome, the guide can be formatted and exported to PDF from
OOo.

The problem really is that the decision was made to use Mediawiki before a
thorough needs assessment was conducted. You truly are trying to drive a
square peg into a round hole here. There are apps specifically designed to
do what you want to do. I've mentioned it before, but I would seriously
check out Daisy, <http://cocoondev.org/daisy/features.html>, which gets you
all of the output formats supported by Apache Cocoon, which Daisy is built
atop. <http://cocoon.apache.org/2.1/features.html>. (The reference to OOo's
format is actually ODF; I checked, although it is not enabled by default.)

   - XML <http://www.w3.org/XML/>
   - HTML <http://www.w3.org/MarkUp/>
   - XHTML <http://www.w3.org/XHTML/>
   - PDF <http://www.adobe.com/products/acrobat/adobepdf.html>
   - OpenOffice.org/StarOffice <http://www.openoffice.org/>
   - MS Excel
   - 
RTF<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnrtfspec/html/rtfspec.asp>
   - Postscript
   - Charts (see external project
Fins<http://www.cocoondev.org/projects/fins.html>
   )
   - Flash <http://www.macromedia.com/>
   - Plain text
   - Scalable Vector Graphics (SVG) <http://www.w3.org/TR/SVG/>
   - MIDI
   - ZIP archives

 Probably some others. The Cocoon developers aren't noted for keeping their
web site up to date. :-)
Daisy also supports a wide range of multimedia formats. It is also designed
from the ground up to avoid the kind of broken link problems you get when
you try to concatenate pages from most wikis, is designed as a collaborative
platform for creating documentation for a bunch of related products, and
includes single editing point inclusion of other pages and transclusion of
other document parts. Daisy also has strong features for managing
collections of documents, including version control. And it's a create in
one format/write to many formats solution, so you don't have to deal with
the incompatibilities of the output formats.
I am really concerned that the longer you try to make Mediawiki do something
it wasn't designed for, the more difficult it will be to disengage and
switch to software that is actually designed to meet your goals.
See also SiSU, <http://www.jus.uio.no/sisu/>, which writes to all of the
formats you want, including ODT, PDF, HTML, XML, etc. See e.g., these
examples of different output of Larry Lessig's Free Culture book. <
http://www.jus.uio.no/sisu/sisu_examples/1.html#25>. If you open the "html,
scroll, document in
one<http://www.jus.uio.no/sisu/free_culture.lawrence_lessig/doc.html>"
link, you'll see that the user can select what format s/he wishes to
read/save the book in. Caveat, I have not checked to see whether or how SiSU
handles graphic images.

See also e.g., xParrot, <http://xparrot.thebird.nl>, another FOSS solution
designed for such purposes that is heavily used in Europe.

"The goal of the xParrot project is to provide a replacement for Word when
writing documents with large groups of people. Other software tools, like
Wiki's, are used for this purpose, but we found they have certain short
comings. In particular with regard to editing (problematic markup for
non-technical people) and navigation (the structure of a Wiki document is
hard to maintain)."

<
http://xparrot.thebird.nl/parrotstable/xparrot/portal/doc.xprt/htmlview_light.html
>.

I haven't checked out whether Wikimedia is in fact one of the multitude of
wiki  products whose URI's get broken when pages are converted and
concatenated, but that's only one of the structural problems the xParrot
developers are referring to. I strongly suspect that the closest you will
come is the hacked build of Mediawiki used for the <http://en.wikibooks.org/>
website. I therefore suggest that you experiment with their book on
OpenOffice.org to see what happens to the links when you concatenate pages
<http://en.wikibooks.org/wiki/Using_OpenOffice.org> (entire guide per app on
one page). At minimum, I suspect you're in for some script-writing to
convert absolute URLs to relative URLs. And I suspect you're going to have a
management problem in terms of keeping link anchor names and and target
names unique in all pages that will need to be concatenated.

Ideally, these would hook into some sort of TOC template where
> you can specify the order of wiki pages for the final book. I
> will definitely be looking into finding a solution for this to
> be able to deliver information "offline" (outside of the wiki).


Again, I'm not saying it can't be done. But you're working with a
screwdriver where a hex-head wrench is the right tool.

Here is what I think we need:
>
> 1) Specify a TOC to define the order of wiki pages in the
>    final output
>    -> this should be fairly easily doable by using the
>    TOC templates we already starting to implement on the wiki


2) Export the corresponding pages from the wiki in the
>    correct sequence
>    -> we can use the built-in export feature of mediawiki


I suspect it will be easier to concatenate pages manually into one page on
the wiki, fix the link and structure artifacts there, then export. That's
what I wind up having to do using Tikiwiki, which actually has a
"structures" feature for ordering pages for e.g., a slide show tour of a set
of pages. See <http://doc.tikiwiki.org/tiki-index.php?page=Structure&bl>.
Drupal has something similar, but I don't recall seeing a Mediawiki
extension for that purpose. I suspect if it existed it would be in use on
the WikiBooks website.

3) Convert the output into XML
>    -> that's either ODF-XML or XML-FO for creating PDFs
>    technically, this should be easy using XSLT. However,
>    some effort is required to write up the conversion
>    style sheets and test them


Viewing page source on Wikipedia, it appears that pages are rendered in
XHTML 1.0 Transitional. You might see what happens if you copy and paste
from page source into an OOo XHTML document, then use OOo to generate the
PDF after repairing the URL artifacts, adding page numbering, section page
headers, etc.

4) Create PDF
>    If we have ODF we can use the OOo export feature,
>    if we have XML-FO, we can use e.g. Apache FO for
>    PDF creation


See <http://www.mediawiki.org/wiki/Extension:Pdf_Export>.


Issues:
>
> - Mediawiki does not support image export (or does it?)
> - The process should be controlled by a script that
>   accesses the wiki, fetches the pages, converts them
>   and creates PDF (did anyone say "Perl"? ;-)


I don't know but I suspect not. There is a rather elegant fork of Mediawiki
called Getwiki, <http://getwiki.net/-GetWiki:Overview>, that in its major
implementation Wikinfo features automagic on demand import of pages from
Wikipedia. <http://getwiki.net/-Wikinfo>. I looked at a bunch of random
pages there and several said that articles from Wikipedia would be imported
without images.

This seems to be the definitive Mediawiki page on exporting page content as
XML. <http://meta.wikimedia.org/wiki/Help:Export>. Looking at their schema,
I didn't see anything that suggested to me that you could get anything but
text. But notice that they do mention some tools for parsing the output. Of
course, you can also save a Wikimedia page on the web from a browser to
local disk and get the whole shebang in a directory with the XHTML source in
XHTML and images as separate files. But taking a quick look at source for
the Wikipedia home page after downloading it, it looks like all of the links
including the <img> links are absolute URLs pointing back to the Wikipedia
web site.

But see <http://www.mediawiki.org/wiki/Extension:XML_Class>:

   - Serves XML/XSL data sources with proper MIME content type (provided
   that Extension:Article Class
Extended<http://www.mediawiki.org/wiki/Extension:Article_Class_Extended>is
conjugated with this extension)
   - Enables Xinclude directives to be used with Mediawiki articles as
   data sources. The extension resolves MW references to local URL
onesfor the client-side browser to easily locate them.

"*MWDumper* is a quick little tool for extracting sets of pages from a
MediaWiki dump file.It can read MediaWiki XML export dumps (version 0.3,
minus uploads), perform optional filtering, and output back to XML or to SQL
statements to add things directly to a database in 1.4 or 1.5 schema. It is
still very much under construction." <http://www.mediawiki.org/wiki/MWDumper
>

Anything else?


Don't know if they require separate extensions, but:

* footnotes/endnotes
* Intra-document bookmark or cross-reference style links to particular text
portions, rather than just to a page in a pin-point way that doesn't depend
on headings and subheadings linked from the page table of contents. E.g.,
there may be a need to link to a particular screen grab rather than to its
closest subheading. Or to say, loop through instructions 6-10 *here.* XHTML
<a href and <a name type linking that will float with its accompanying
text..

Also:

* Concordance/subject matter indexing (I doubt that can be done in Media
wiki, although Daisy generates concordance files).

* Clearly identified and documented workflows for the new process, plus
associated project tracking tools adapted to those workflows.

Some other links you might take a look at.

On grouping and ordering pages:

<http://www.mediawiki.org/wiki/Extension:DynamicPageList>
<http://www.mediawiki.org/wiki/Extension:DynamicPageList2_0.9>
<http://www.mediawiki.org/wiki/Extension:Export_by_category>

On maintaining a single editing point for recurring content and templates,
there seems to be no Wikimedia *system* for managing such content, only a
hodgepodge of extensions that attack the problem piecemeal. You might
consider using a couple of the extensions listed below to create a special
namespace in Mediawiki that all transcluded content must originate from and
block users' ability to locate it anywhere else.

I know that sounds drastic, but none of the following linked Mediawiki
extensions even discussed what happens when someone deletes content or
markup setting off content that has been transcluded to another page, or if
the surviving transclusion markup gets buried so deeply in the revision
history that it disappears. I'd expect that if there are problems with the
most recent version that Mediawiki wouldn't even branch to an earlier
version. You'd think there might be discussion of an error message at least
being thrown by one of those extensions if a transclusion was unsuccessful ,
but I didn't see any sign of it. Short story, I haven't come up with
anything off the top of my head that would allow painless, systematic, and
secure management of transcluded content using Mediawiki. Right problem;
wrong solution.

<http://meta.wikimedia.org/wiki/Help:Template> (page transclusion; "a wiki
subroutine <http://en.wikipedia.org/wiki/en:subroutine> facility and is
comparable to a #include statement or
macro<http://en.wiktionary.org/wiki/en:macro#Noun>that is expanded at
page view time.
Substitution <http://meta.wikimedia.org/wiki/Help:Substitution> allows
templates to be used as a macro facility.")
<http://www.mediawiki.org/wiki/Extension:Labeled_Section_Transclusion>
<http://www.mediawiki.org/wiki/Labeled_Section_Transclusion>
<http://www.mediawiki.org/wiki/Extension:PageSecurity#Optional_transclusions
>
<http://www.mediawiki.org/wiki/Extension:PageSecurity/PageSecurity.php>
(note the "protect transclusion" routine)
<http://www.mediawiki.org/wiki/Security_issues_with_authorization_extensions>
(See test inclusion/transclusion section).
<
http://en.wikisource.org/wiki/Help:Side_by_side_image_view_for_proofreading#Transclusion
>
<http://comments.gmane.org/gmane.org.wikimedia.mediawiki/16997> (interwiki
transclusion)
<http://www.mediawiki.org/wiki/Manual:%24wgEnableScaryTranscluding>
(interwiki transclusion of templates).
<http://www.mediawiki.org/wiki/Extension:IncludeArticle> (first xxx
characters)
<http://www.mediawiki.org/wiki/Extension:ConditionalTemplate> (conditional
transclusion)
<http://www.mediawiki.org/wiki/Help:Templates#Control_template_inclusion>
(conditional inclusion of template)
<http://www.mediawiki.org/wiki/Manual:%24wgNonincludableNamespaces> (exclude
pages in namespace from inclusions)
<http://www.mediawiki.org/wiki/Extension:SecureHTML> (more control over
inclusions and secure portions of page).

Best of luck,

Marbux

Re: [documentation-dev] User guide

Reply via email to