On 7/31/07, Frank Peters <[EMAIL PROTECTED]> wrote: > > If we want to move to the wiki as the main collaborative > platform we need mediawiki->ODF and mediawiki->PDF conversion > processes.
I can't say the required extension to Mediawiki doesn't exist, but I went through all of the Media Wiki extensions about 6 months back. and while there is at least one for writing to PDF on a per page basis, you probably would not be happy with its output. But you might look the extensions over to see if what you need exists, starting here. < http://www.mediawiki.org/wiki/MediaWiki_extensions>. The problem is that wikis are really designed to produce variable length cross-linked web pages, not to produce a dead-tree book-like document with niceties like pages of equal length and page numbering. I suspect the *best* you'll be able to do is to somehow cocatenate separate wiki pages into a single wiki page page, fixing absolute URL artifacts, introducing a conversion to ODT at an appropriate point in the process, change absolute URLs to relative URIs, then doing lots of manual link checking to make sure all your links have been properly preserved/restored. Once those kinds of problems are overcome, the guide can be formatted and exported to PDF from OOo. The problem really is that the decision was made to use Mediawiki before a thorough needs assessment was conducted. You truly are trying to drive a square peg into a round hole here. There are apps specifically designed to do what you want to do. I've mentioned it before, but I would seriously check out Daisy, <http://cocoondev.org/daisy/features.html>, which gets you all of the output formats supported by Apache Cocoon, which Daisy is built atop. <http://cocoon.apache.org/2.1/features.html>. (The reference to OOo's format is actually ODF; I checked, although it is not enabled by default.) - XML <http://www.w3.org/XML/> - HTML <http://www.w3.org/MarkUp/> - XHTML <http://www.w3.org/XHTML/> - PDF <http://www.adobe.com/products/acrobat/adobepdf.html> - OpenOffice.org/StarOffice <http://www.openoffice.org/> - MS Excel - RTF<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnrtfspec/html/rtfspec.asp> - Postscript - Charts (see external project Fins<http://www.cocoondev.org/projects/fins.html> ) - Flash <http://www.macromedia.com/> - Plain text - Scalable Vector Graphics (SVG) <http://www.w3.org/TR/SVG/> - MIDI - ZIP archives Probably some others. The Cocoon developers aren't noted for keeping their web site up to date. :-) Daisy also supports a wide range of multimedia formats. It is also designed from the ground up to avoid the kind of broken link problems you get when you try to concatenate pages from most wikis, is designed as a collaborative platform for creating documentation for a bunch of related products, and includes single editing point inclusion of other pages and transclusion of other document parts. Daisy also has strong features for managing collections of documents, including version control. And it's a create in one format/write to many formats solution, so you don't have to deal with the incompatibilities of the output formats. I am really concerned that the longer you try to make Mediawiki do something it wasn't designed for, the more difficult it will be to disengage and switch to software that is actually designed to meet your goals. See also SiSU, <http://www.jus.uio.no/sisu/>, which writes to all of the formats you want, including ODT, PDF, HTML, XML, etc. See e.g., these examples of different output of Larry Lessig's Free Culture book. < http://www.jus.uio.no/sisu/sisu_examples/1.html#25>. If you open the "html, scroll, document in one<http://www.jus.uio.no/sisu/free_culture.lawrence_lessig/doc.html>" link, you'll see that the user can select what format s/he wishes to read/save the book in. Caveat, I have not checked to see whether or how SiSU handles graphic images. See also e.g., xParrot, <http://xparrot.thebird.nl>, another FOSS solution designed for such purposes that is heavily used in Europe. "The goal of the xParrot project is to provide a replacement for Word when writing documents with large groups of people. Other software tools, like Wiki's, are used for this purpose, but we found they have certain short comings. In particular with regard to editing (problematic markup for non-technical people) and navigation (the structure of a Wiki document is hard to maintain)." < http://xparrot.thebird.nl/parrotstable/xparrot/portal/doc.xprt/htmlview_light.html >. I haven't checked out whether Wikimedia is in fact one of the multitude of wiki products whose URI's get broken when pages are converted and concatenated, but that's only one of the structural problems the xParrot developers are referring to. I strongly suspect that the closest you will come is the hacked build of Mediawiki used for the <http://en.wikibooks.org/> website. I therefore suggest that you experiment with their book on OpenOffice.org to see what happens to the links when you concatenate pages <http://en.wikibooks.org/wiki/Using_OpenOffice.org> (entire guide per app on one page). At minimum, I suspect you're in for some script-writing to convert absolute URLs to relative URLs. And I suspect you're going to have a management problem in terms of keeping link anchor names and and target names unique in all pages that will need to be concatenated. Ideally, these would hook into some sort of TOC template where > you can specify the order of wiki pages for the final book. I > will definitely be looking into finding a solution for this to > be able to deliver information "offline" (outside of the wiki). Again, I'm not saying it can't be done. But you're working with a screwdriver where a hex-head wrench is the right tool. Here is what I think we need: > > 1) Specify a TOC to define the order of wiki pages in the > final output > -> this should be fairly easily doable by using the > TOC templates we already starting to implement on the wiki 2) Export the corresponding pages from the wiki in the > correct sequence > -> we can use the built-in export feature of mediawiki I suspect it will be easier to concatenate pages manually into one page on the wiki, fix the link and structure artifacts there, then export. That's what I wind up having to do using Tikiwiki, which actually has a "structures" feature for ordering pages for e.g., a slide show tour of a set of pages. See <http://doc.tikiwiki.org/tiki-index.php?page=Structure&bl>. Drupal has something similar, but I don't recall seeing a Mediawiki extension for that purpose. I suspect if it existed it would be in use on the WikiBooks website. 3) Convert the output into XML > -> that's either ODF-XML or XML-FO for creating PDFs > technically, this should be easy using XSLT. However, > some effort is required to write up the conversion > style sheets and test them Viewing page source on Wikipedia, it appears that pages are rendered in XHTML 1.0 Transitional. You might see what happens if you copy and paste from page source into an OOo XHTML document, then use OOo to generate the PDF after repairing the URL artifacts, adding page numbering, section page headers, etc. 4) Create PDF > If we have ODF we can use the OOo export feature, > if we have XML-FO, we can use e.g. Apache FO for > PDF creation See <http://www.mediawiki.org/wiki/Extension:Pdf_Export>. Issues: > > - Mediawiki does not support image export (or does it?) > - The process should be controlled by a script that > accesses the wiki, fetches the pages, converts them > and creates PDF (did anyone say "Perl"? ;-) I don't know but I suspect not. There is a rather elegant fork of Mediawiki called Getwiki, <http://getwiki.net/-GetWiki:Overview>, that in its major implementation Wikinfo features automagic on demand import of pages from Wikipedia. <http://getwiki.net/-Wikinfo>. I looked at a bunch of random pages there and several said that articles from Wikipedia would be imported without images. This seems to be the definitive Mediawiki page on exporting page content as XML. <http://meta.wikimedia.org/wiki/Help:Export>. Looking at their schema, I didn't see anything that suggested to me that you could get anything but text. But notice that they do mention some tools for parsing the output. Of course, you can also save a Wikimedia page on the web from a browser to local disk and get the whole shebang in a directory with the XHTML source in XHTML and images as separate files. But taking a quick look at source for the Wikipedia home page after downloading it, it looks like all of the links including the <img> links are absolute URLs pointing back to the Wikipedia web site. But see <http://www.mediawiki.org/wiki/Extension:XML_Class>: - Serves XML/XSL data sources with proper MIME content type (provided that Extension:Article Class Extended<http://www.mediawiki.org/wiki/Extension:Article_Class_Extended>is conjugated with this extension) - Enables Xinclude directives to be used with Mediawiki articles as data sources. The extension resolves MW references to local URL onesfor the client-side browser to easily locate them. "*MWDumper* is a quick little tool for extracting sets of pages from a MediaWiki dump file.It can read MediaWiki XML export dumps (version 0.3, minus uploads), perform optional filtering, and output back to XML or to SQL statements to add things directly to a database in 1.4 or 1.5 schema. It is still very much under construction." <http://www.mediawiki.org/wiki/MWDumper > Anything else? Don't know if they require separate extensions, but: * footnotes/endnotes * Intra-document bookmark or cross-reference style links to particular text portions, rather than just to a page in a pin-point way that doesn't depend on headings and subheadings linked from the page table of contents. E.g., there may be a need to link to a particular screen grab rather than to its closest subheading. Or to say, loop through instructions 6-10 *here.* XHTML <a href and <a name type linking that will float with its accompanying text.. Also: * Concordance/subject matter indexing (I doubt that can be done in Media wiki, although Daisy generates concordance files). * Clearly identified and documented workflows for the new process, plus associated project tracking tools adapted to those workflows. Some other links you might take a look at. On grouping and ordering pages: <http://www.mediawiki.org/wiki/Extension:DynamicPageList> <http://www.mediawiki.org/wiki/Extension:DynamicPageList2_0.9> <http://www.mediawiki.org/wiki/Extension:Export_by_category> On maintaining a single editing point for recurring content and templates, there seems to be no Wikimedia *system* for managing such content, only a hodgepodge of extensions that attack the problem piecemeal. You might consider using a couple of the extensions listed below to create a special namespace in Mediawiki that all transcluded content must originate from and block users' ability to locate it anywhere else. I know that sounds drastic, but none of the following linked Mediawiki extensions even discussed what happens when someone deletes content or markup setting off content that has been transcluded to another page, or if the surviving transclusion markup gets buried so deeply in the revision history that it disappears. I'd expect that if there are problems with the most recent version that Mediawiki wouldn't even branch to an earlier version. You'd think there might be discussion of an error message at least being thrown by one of those extensions if a transclusion was unsuccessful , but I didn't see any sign of it. Short story, I haven't come up with anything off the top of my head that would allow painless, systematic, and secure management of transcluded content using Mediawiki. Right problem; wrong solution. <http://meta.wikimedia.org/wiki/Help:Template> (page transclusion; "a wiki subroutine <http://en.wikipedia.org/wiki/en:subroutine> facility and is comparable to a #include statement or macro<http://en.wiktionary.org/wiki/en:macro#Noun>that is expanded at page view time. Substitution <http://meta.wikimedia.org/wiki/Help:Substitution> allows templates to be used as a macro facility.") <http://www.mediawiki.org/wiki/Extension:Labeled_Section_Transclusion> <http://www.mediawiki.org/wiki/Labeled_Section_Transclusion> <http://www.mediawiki.org/wiki/Extension:PageSecurity#Optional_transclusions > <http://www.mediawiki.org/wiki/Extension:PageSecurity/PageSecurity.php> (note the "protect transclusion" routine) <http://www.mediawiki.org/wiki/Security_issues_with_authorization_extensions> (See test inclusion/transclusion section). < http://en.wikisource.org/wiki/Help:Side_by_side_image_view_for_proofreading#Transclusion > <http://comments.gmane.org/gmane.org.wikimedia.mediawiki/16997> (interwiki transclusion) <http://www.mediawiki.org/wiki/Manual:%24wgEnableScaryTranscluding> (interwiki transclusion of templates). <http://www.mediawiki.org/wiki/Extension:IncludeArticle> (first xxx characters) <http://www.mediawiki.org/wiki/Extension:ConditionalTemplate> (conditional transclusion) <http://www.mediawiki.org/wiki/Help:Templates#Control_template_inclusion> (conditional inclusion of template) <http://www.mediawiki.org/wiki/Manual:%24wgNonincludableNamespaces> (exclude pages in namespace from inclusions) <http://www.mediawiki.org/wiki/Extension:SecureHTML> (more control over inclusions and secure portions of page). Best of luck, Marbux