ArielGlenn has submitted this change and it was merged.

Change subject: More documentation of bz2 multistream and index files
......................................................................


More documentation of bz2 multistream and index files

Change-Id: I8878cde3327d9832840ff2e4296dd74dc0085f88
---
M toys/bz2multistream/INSTALL.txt
M toys/bz2multistream/README.txt
2 files changed, 8 insertions(+), 2 deletions(-)

Approvals:
  ArielGlenn: Verified; Looks good to me, approved



diff --git a/toys/bz2multistream/INSTALL.txt b/toys/bz2multistream/INSTALL.txt
index c13fca5..733c686 100644
--- a/toys/bz2multistream/INSTALL.txt
+++ b/toys/bz2multistream/INSTALL.txt
@@ -1,7 +1,12 @@
 Here's the preparation you will need to do in order to use the reader:
 
 * Create a multistream bz2 file from the XML file of article texts you wish to 
use.
+  This will contain concatenated bz2 streams, each stream containing 100 pages,
+  and each stream capable of being treated as a separate bz2 file as far as
+  bzip2 libraries or the language bindings of your choice go.
 * Create an index file for the multistream bz2 file.
+  This will contain lines with the following:
+  file-offset:page-id:page-title lines
 * Create a sorted version of the index file (beware, on some platforms 
including linux
   you must specify the C locale for sort not to give an ordering useless for 
the next step)
 * Create a ToC file for the index file.
diff --git a/toys/bz2multistream/README.txt b/toys/bz2multistream/README.txt
index 64fe3de..a3c1bd4 100644
--- a/toys/bz2multistream/README.txt
+++ b/toys/bz2multistream/README.txt
@@ -15,8 +15,9 @@
 or contributors who may want to work with specific multiple article texts
 at once in an automated fashion.
 
-See INSTALL.txt for how to generated the needed files and how
-to run the article text retrieval script.
+See INSTALL.txt for how to generated the needed files, the contents
+of the bz2 content and index files, and how to run the article text
+retrieval script.
 
 Platforms:
 

-- 
To view, visit https://gerrit.wikimedia.org/r/64921
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I8878cde3327d9832840ff2e4296dd74dc0085f88
Gerrit-PatchSet: 1
Gerrit-Project: operations/dumps
Gerrit-Branch: ariel
Gerrit-Owner: ArielGlenn <ar...@wikimedia.org>
Gerrit-Reviewer: ArielGlenn <ar...@wikimedia.org>
Gerrit-Reviewer: jenkins-bot

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to