I just did a snapshot of the wiki, and did a conversion (as outlined in section 2.4 of the rfp), and the result was 439 wiki pages. There are approximately 31019 lines of text contained in those pages (cat * | wc -l).

Of those, there are user pages, junk pages, and probably some spam pages.

I might take a look through some of them to get an idea what a reasonable workload expectation for a single "job" would be. I initially thought a 10 page chunk would be good..if anyone is interested in taking a look at the wikitext itself...

Here is the preconverted text, after stripping out the phpwiki dump hearders.
http://cactuswax.net/~eliott/projects/wiki_migration/dump_test/pre_convert

Here is the result of the conversion script:
http://cactuswax.net/~eliott/projects/wiki_migration/dump_test/post_convert

Unfortunately, the autoconversion is simple (as in doesn't cover many transformations), and at times causes more trouble than it is worth.
But, it does do some things..

_______________________________________________
arch mailing list
[email protected]
http://www.archlinux.org/mailman/listinfo/arch

Reply via email to