Excellent discussion in the last 2 emails!  Beautiful!

We may have to experiment some to come up with the best overall solution. Also we will want to consider such aspects as which projects are under the most active development and have the greatest potential for long-term growth. The most active communities will often (but not always) resolve points of interest more quickly.

Alan

Jim Harris wrote:
Wow! And, for what very little it's worth, I agree with the goals and most of the methods of meeting them. The only reason I didn't support this suggestion more strongly earlier is that I seem to recall that similar suggestions had been investigated by the "responsible parties" and rejected. To convert stuff from the wikis to "more manageable(?)" HTML or even PDF, for the needed functions, "to somehow concatenate separate wiki pages into a single wiki page page, fixing absolute URL artifacts", and "change absolute URLs to relative URIs, then doing lots of manual link checking to make sure all your links have been properly preserved/restored", here are some possible helps: 1. There are some nifty free/inexpensive software tools that may be of use here. They are cleverly hidden as "off-line browsers" , "site managers", "mirror utils", "web copiers" and the like, but generally the way they make a local copy of a web site (perhaps beginning at a specific URL) is that they put all the images in one folder, change _all_ the "href"s, execute Javascript, find Server-Side Includes and replicate their functions, etc. in order to make the result completely independent of the original web server. Many are configurable as to how many levels of linking will be followed, whether it is limited to the original site/server, and many other useful features. They rename all the images etc. (bad) but keep the names unique in the local version (good). WebCopier is one of my favorites. The old freeware versions are still discoverable, and that may be all that is needed for this function. The current (trialware) versions ($30 - $50) are available at http://www.maximumsoft.com/downloads/index.html and there are at least ten others that may be viable choices. I don't suppose any of them does a good job of handling specific references to a place in a page in either the same or a different document. 2. To concatenate web pages (HTML files) A and B (in that order), I usually find this works fine: Open A.html and B.html in a text editor (or maybe even OOo Writer). In B, copy everything _between_ <body> and </body>. In A, paste just above </body>. Save A. I think it should be easy to write a script (or macro) to do this, if manually is not a good alternative. But this probably only demonstrates my ignorance of the real situation; perhaps someone reading this would be good enough to confirm this. :-) 3. For producing PDFs from any original that can be printed, I still like printing to FILE with a good PostScript printer driver (telling it what page layout to use) then converting to PDF via Ghostscript (the GPL versions are always at http://pages.cs.wisc.edu/~ghost/). It is always perfect, to my experience, with hundreds of jobs over several years with multiple printer drivers, including images, generated footers, page numbers, etc., exactly as it would look if printed on paper, though I still like the HP DesignJet 755CM driver for Windows (any version) (free, of course) the best. IHTH Jim

[EMAIL PROTECTED] 10:57 2007-07-31 >>>

On 7/31/07, Frank Peters <[EMAIL PROTECTED]> wrote:
If we want to move to the wiki as the main collaborative
platform we need mediawiki->ODF and mediawiki->PDF conversion
processes.


I can't say the required extension to Mediawiki doesn't exist, but I went
through all of the Media Wiki extensions about 6 months back. and while
there is at least one for writing to PDF on a per page basis, you probably
would not be happy with its output. But you might look the extensions over
to see if what you need exists, starting here. <
http://www.mediawiki.org/wiki/MediaWiki_extensions>.

The problem is that wikis are really designed to produce variable length
cross-linked web pages, not to produce a dead-tree book-like document with
niceties like pages of equal length and page numbering. I suspect the *best*
you'll be able to do is to somehow cocatenate separate wiki pages into a
single wiki page page, fixing absolute URL artifacts, introducing a
conversion to ODT at an appropriate point in the process, change absolute
URLs to relative URIs, then doing lots of manual link checking to make sure
all your links have been properly preserved/restored. Once those kinds of
problems are overcome, the guide can be formatted and exported to PDF from
OOo.

The problem really is that the decision was made to use Mediawiki before a
thorough needs assessment was conducted. You truly are trying to drive a
square peg into a round hole here. There are apps specifically designed to
do what you want to do. I've mentioned it before, but I would seriously
check out Daisy, <http://cocoondev.org/daisy/features.html>, which gets you
all of the output formats supported by Apache Cocoon, which Daisy is built
atop. <http://cocoon.apache.org/2.1/features.html>. (The reference to OOo's
format is actually ODF; I checked, although it is not enabled by default.)

   - XML <http://www.w3.org/XML/>
   - HTML <http://www.w3.org/MarkUp/>
   - XHTML <http://www.w3.org/XHTML/>
   - PDF <http://www.adobe.com/products/acrobat/adobepdf.html>
   - OpenOffice.org/StarOffice <http://www.openoffice.org/>
   - MS Excel
   - 
RTF<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnrtfspec/html/rtfspec.asp>
   - Postscript
   - Charts (see external project
Fins<http://www.cocoondev.org/projects/fins.html>
   )
   - Flash <http://www.macromedia.com/>
   - Plain text
   - Scalable Vector Graphics (SVG) <http://www.w3.org/TR/SVG/>
   - MIDI
   - ZIP archives

Probably some others. The Cocoon developers aren't noted for keeping their
web site up to date. :-)
Daisy also supports a wide range of multimedia formats. It is also designed
from the ground up to avoid the kind of broken link problems you get when
you try to concatenate pages from most wikis, is designed as a collaborative
platform for creating documentation for a bunch of related products, and
includes single editing point inclusion of other pages and transclusion of
other document parts. Daisy also has strong features for managing
collections of documents, including version control. And it's a create in
one format/write to many formats solution, so you don't have to deal with
the incompatibilities of the output formats.
I am really concerned that the longer you try to make Mediawiki do something
it wasn't designed for, the more difficult it will be to disengage and
switch to software that is actually designed to meet your goals.
See also SiSU, <http://www.jus.uio.no/sisu/>, which writes to all of the
formats you want, including ODT, PDF, HTML, XML, etc. See e.g., these
examples of different output of Larry Lessig's Free Culture book. <
http://www.jus.uio.no/sisu/sisu_examples/1.html#25>. If you open the "html,
scroll, document in
one<http://www.jus.uio.no/sisu/free_culture.lawrence_lessig/doc.html>"
link, you'll see that the user can select what format s/he wishes to
read/save the book in. Caveat, I have not checked to see whether or how SiSU
handles graphic images.

See also e.g., xParrot, <http://xparrot.thebird.nl>, another FOSS solution
designed for such purposes that is heavily used in Europe.

"The goal of the xParrot project is to provide a replacement for Word when
writing documents with large groups of people. Other software tools, like
Wiki's, are used for this purpose, but we found they have certain short
comings. In particular with regard to editing (problematic markup for
non-technical people) and navigation (the structure of a Wiki document is
hard to maintain)."

<
http://xparrot.thebird.nl/parrotstable/xparrot/portal/doc.xprt/htmlview_light.html
.

I haven't checked out whether Wikimedia is in fact one of the multitude of
wiki  products whose URI's get broken when pages are converted and
concatenated, but that's only one of the structural problems the xParrot
developers are referring to. I strongly suspect that the closest you will
come is the hacked build of Mediawiki used for the <http://en.wikibooks.org/>
website. I therefore suggest that you experiment with their book on
OpenOffice.org to see what happens to the links when you concatenate pages
<http://en.wikibooks.org/wiki/Using_OpenOffice.org> (entire guide per app on
one page). At minimum, I suspect you're in for some script-writing to
convert absolute URLs to relative URLs. And I suspect you're going to have a
management problem in terms of keeping link anchor names and and target
names unique in all pages that will need to be concatenated.

Ideally, these would hook into some sort of TOC template where
you can specify the order of wiki pages for the final book. I
will definitely be looking into finding a solution for this to
be able to deliver information "offline" (outside of the wiki).


Again, I'm not saying it can't be done. But you're working with a
screwdriver where a hex-head wrench is the right tool.

Here is what I think we need:
1) Specify a TOC to define the order of wiki pages in the
   final output
   -> this should be fairly easily doable by using the
   TOC templates we already starting to implement on the wiki


2) Export the corresponding pages from the wiki in the
   correct sequence
   -> we can use the built-in export feature of mediawiki


I suspect it will be easier to concatenate pages manually into one page on
the wiki, fix the link and structure artifacts there, then export. That's
what I wind up having to do using Tikiwiki, which actually has a
"structures" feature for ordering pages for e.g., a slide show tour of a set
of pages. See <http://doc.tikiwiki.org/tiki-index.php?page=Structure&bl>.
Drupal has something similar, but I don't recall seeing a Mediawiki
extension for that purpose. I suspect if it existed it would be in use on
the WikiBooks website.

3) Convert the output into XML
   -> that's either ODF-XML or XML-FO for creating PDFs
   technically, this should be easy using XSLT. However,
   some effort is required to write up the conversion
   style sheets and test them


Viewing page source on Wikipedia, it appears that pages are rendered in
XHTML 1.0 Transitional. You might see what happens if you copy and paste
from page source into an OOo XHTML document, then use OOo to generate the
PDF after repairing the URL artifacts, adding page numbering, section page
headers, etc.

4) Create PDF
   If we have ODF we can use the OOo export feature,
   if we have XML-FO, we can use e.g. Apache FO for
   PDF creation


See <http://www.mediawiki.org/wiki/Extension:Pdf_Export>.


Issues:
- Mediawiki does not support image export (or does it?)
- The process should be controlled by a script that
  accesses the wiki, fetches the pages, converts them
  and creates PDF (did anyone say "Perl"? ;-)


I don't know but I suspect not. There is a rather elegant fork of Mediawiki
called Getwiki, <http://getwiki.net/-GetWiki:Overview>, that in its major
implementation Wikinfo features automagic on demand import of pages from
Wikipedia. <http://getwiki.net/-Wikinfo>. I looked at a bunch of random
pages there and several said that articles from Wikipedia would be imported
without images.

This seems to be the definitive Mediawiki page on exporting page content as
XML. <http://meta.wikimedia.org/wiki/Help:Export>. Looking at their schema,
I didn't see anything that suggested to me that you could get anything but
text. But notice that they do mention some tools for parsing the output. Of
course, you can also save a Wikimedia page on the web from a browser to
local disk and get the whole shebang in a directory with the XHTML source in
XHTML and images as separate files. But taking a quick look at source for
the Wikipedia home page after downloading it, it looks like all of the links
including the <img> links are absolute URLs pointing back to the Wikipedia
web site.

But see <http://www.mediawiki.org/wiki/Extension:XML_Class>:

   - Serves XML/XSL data sources with proper MIME content type (provided
   that Extension:Article Class
Extended<http://www.mediawiki.org/wiki/Extension:Article_Class_Extended>is
conjugated with this extension)
   - Enables Xinclude directives to be used with Mediawiki articles as
   data sources. The extension resolves MW references to local URL
onesfor the client-side browser to easily locate them.

"*MWDumper* is a quick little tool for extracting sets of pages from a
MediaWiki dump file.It can read MediaWiki XML export dumps (version 0.3,
minus uploads), perform optional filtering, and output back to XML or to SQL
statements to add things directly to a database in 1.4 or 1.5 schema. It is
still very much under construction." <http://www.mediawiki.org/wiki/MWDumper
Anything else?


Don't know if they require separate extensions, but:

* footnotes/endnotes
* Intra-document bookmark or cross-reference style links to particular text
portions, rather than just to a page in a pin-point way that doesn't depend
on headings and subheadings linked from the page table of contents. E.g.,
there may be a need to link to a particular screen grab rather than to its
closest subheading. Or to say, loop through instructions 6-10 *here.* XHTML
<a href and <a name type linking that will float with its accompanying
text..

Also:

* Concordance/subject matter indexing (I doubt that can be done in Media
wiki, although Daisy generates concordance files).

* Clearly identified and documented workflows for the new process, plus
associated project tracking tools adapted to those workflows.

Some other links you might take a look at.

On grouping and ordering pages:

<http://www.mediawiki.org/wiki/Extension:DynamicPageList>
<http://www.mediawiki.org/wiki/Extension:DynamicPageList2_0.9>
<http://www.mediawiki.org/wiki/Extension:Export_by_category>

On maintaining a single editing point for recurring content and templates,
there seems to be no Wikimedia *system* for managing such content, only a
hodgepodge of extensions that attack the problem piecemeal. You might
consider using a couple of the extensions listed below to create a special
namespace in Mediawiki that all transcluded content must originate from and
block users' ability to locate it anywhere else.

I know that sounds drastic, but none of the following linked Mediawiki
extensions even discussed what happens when someone deletes content or
markup setting off content that has been transcluded to another page, or if
the surviving transclusion markup gets buried so deeply in the revision
history that it disappears. I'd expect that if there are problems with the
most recent version that Mediawiki wouldn't even branch to an earlier
version. You'd think there might be discussion of an error message at least
being thrown by one of those extensions if a transclusion was unsuccessful ,
but I didn't see any sign of it. Short story, I haven't come up with
anything off the top of my head that would allow painless, systematic, and
secure management of transcluded content using Mediawiki. Right problem;
wrong solution.

<http://meta.wikimedia.org/wiki/Help:Template> (page transclusion; "a wiki
subroutine <http://en.wikipedia.org/wiki/en:subroutine> facility and is
comparable to a #include statement or
macro<http://en.wiktionary.org/wiki/en:macro#Noun>that is expanded at
page view time.
Substitution <http://meta.wikimedia.org/wiki/Help:Substitution> allows
templates to be used as a macro facility.")
<http://www.mediawiki.org/wiki/Extension:Labeled_Section_Transclusion>
<http://www.mediawiki.org/wiki/Labeled_Section_Transclusion>
<http://www.mediawiki.org/wiki/Extension:PageSecurity#Optional_transclusions <http://www.mediawiki.org/wiki/Extension:PageSecurity/PageSecurity.php>
(note the "protect transclusion" routine)
<http://www.mediawiki.org/wiki/Security_issues_with_authorization_extensions>
(See test inclusion/transclusion section).
<
http://en.wikisource.org/wiki/Help:Side_by_side_image_view_for_proofreading#Transclusion <http://comments.gmane.org/gmane.org.wikimedia.mediawiki/16997> (interwiki
transclusion)
<http://www.mediawiki.org/wiki/Manual:%24wgEnableScaryTranscluding>
(interwiki transclusion of templates).
<http://www.mediawiki.org/wiki/Extension:IncludeArticle> (first xxx
characters)
<http://www.mediawiki.org/wiki/Extension:ConditionalTemplate> (conditional
transclusion)
<http://www.mediawiki.org/wiki/Help:Templates#Control_template_inclusion>
(conditional inclusion of template)
<http://www.mediawiki.org/wiki/Manual:%24wgNonincludableNamespaces> (exclude
pages in namespace from inclusions)
<http://www.mediawiki.org/wiki/Extension:SecureHTML> (more control over
inclusions and secure portions of page).

Best of luck,

Marbux



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to