ArielGlenn has submitted this change and it was merged. Change subject: More documentation of bz2 multistream and index files ......................................................................
More documentation of bz2 multistream and index files Change-Id: I8878cde3327d9832840ff2e4296dd74dc0085f88 --- M toys/bz2multistream/INSTALL.txt M toys/bz2multistream/README.txt 2 files changed, 8 insertions(+), 2 deletions(-) Approvals: ArielGlenn: Verified; Looks good to me, approved diff --git a/toys/bz2multistream/INSTALL.txt b/toys/bz2multistream/INSTALL.txt index c13fca5..733c686 100644 --- a/toys/bz2multistream/INSTALL.txt +++ b/toys/bz2multistream/INSTALL.txt @@ -1,7 +1,12 @@ Here's the preparation you will need to do in order to use the reader: * Create a multistream bz2 file from the XML file of article texts you wish to use. + This will contain concatenated bz2 streams, each stream containing 100 pages, + and each stream capable of being treated as a separate bz2 file as far as + bzip2 libraries or the language bindings of your choice go. * Create an index file for the multistream bz2 file. + This will contain lines with the following: + file-offset:page-id:page-title lines * Create a sorted version of the index file (beware, on some platforms including linux you must specify the C locale for sort not to give an ordering useless for the next step) * Create a ToC file for the index file. diff --git a/toys/bz2multistream/README.txt b/toys/bz2multistream/README.txt index 64fe3de..a3c1bd4 100644 --- a/toys/bz2multistream/README.txt +++ b/toys/bz2multistream/README.txt @@ -15,8 +15,9 @@ or contributors who may want to work with specific multiple article texts at once in an automated fashion. -See INSTALL.txt for how to generated the needed files and how -to run the article text retrieval script. +See INSTALL.txt for how to generated the needed files, the contents +of the bz2 content and index files, and how to run the article text +retrieval script. Platforms: -- To view, visit https://gerrit.wikimedia.org/r/64921 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I8878cde3327d9832840ff2e4296dd74dc0085f88 Gerrit-PatchSet: 1 Gerrit-Project: operations/dumps Gerrit-Branch: ariel Gerrit-Owner: ArielGlenn <ar...@wikimedia.org> Gerrit-Reviewer: ArielGlenn <ar...@wikimedia.org> Gerrit-Reviewer: jenkins-bot _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits