Clarification: This last message was by Rotem, a fellow WM-IL member helping me with the embedding of the Hebrew Wikipedia in the One Computer Per Child project.
He is reporting issues with Kiwix and the ZIM file I created last week. Regarding size: Size is important, because we intend to add images (the 300MB ZIM file is the complete Hebrew Wikipedia text, but no pictures). We are hoping to have at least 5GB reserved for us in those One Computer Per Child machines we are to install on, but we may be forced to make do with 3GB. So every MB saved from the index, is another MB available for images... Asaf Bartov Wikimedia Israel On Mon, Jul 6, 2009 at 3:58 PM, Rotem Simha <[email protected]> wrote: > * there are some errors in links of files and special pages > examples > קובץ:Nuvola_apps_important.svg<http://commons.wikimedia.org/wiki/File:Nuvola_apps_important.svg> > link > to ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים ללא תמונות/קטגוריות/ספורטאים > איטלקים(wikipedia:wikipedia projects\ articles without > images\categories\Sports > people from Italy) > מיוחד:אקראי (Special:Random) > 15 במאי (may 15) > מיוחד:שינויים אחרונים (Special:RecentChanges) > 10_באוגוסט > > * size is important because we intend to add images > > 2009/7/6 <[email protected]> > >> Send dev-l mailing list submissions to >> [email protected] >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://intern.openzim.org/mailman/listinfo/dev-l >> or, via email, send a message with subject or body 'help' to >> [email protected] >> >> You can reach the person managing the list at >> [email protected] >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of dev-l digest..." >> >> >> Today's Topics: >> >> 1. Kiwix index size (Asaf Bartov) >> 2. Re: Kiwix index size (Manuel Schneider) >> 3. Re: Kiwix index size (Emmanuel Engelhart) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Sun, 5 Jul 2009 19:18:57 +0300 >> From: Asaf Bartov <[email protected]> >> Subject: [openZIM dev-l] Kiwix index size >> To: [email protected] >> Message-ID: >> <[email protected]> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hi, everyone. >> >> When running Kiwix's indexer on the ZIM file I had created from the Hebrew >> Wikipedia last week, the Kiwix data directory ran up to a total of 31 >> items, >> totalling 2.3 GB. The ZIM file itself is ~300MB. Does this proportion >> make >> sense? >> >> Detailed ls output attached. >> >> Thanks in advance, >> >> Asaf Bartov >> -- >> Asaf Bartov <[email protected]> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> http://intern.openzim.org/pipermail/dev-l/attachments/20090705/2afee878/attachment.html >> > >> -------------- next part -------------- >> ro...@desktop:~/.www.kiwix.org/kiwix$ ls -l -h -a -R >> .: >> total 16K >> drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 . >> drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 .. >> drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 7680jxd5.default >> -rw-r--r-- 1 rotem rotem 94 2009-07-01 16:10 profiles.ini >> >> ./7680jxd5.default: >> total 1.7M >> drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 . >> drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 .. >> drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13 >> 31c26198d06ad265677b450796cc09aa.index >> -rw------- 1 rotem rotem 162 2009-07-05 18:19 compatibility.ini >> -rw-r--r-- 1 rotem rotem 135K 2009-07-05 18:19 compreg.dat >> drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 extensions >> -rw-r--r-- 1 rotem rotem 169 2009-07-01 16:10 localstore.rdf >> -rw-r--r-- 1 rotem rotem 304 2009-07-05 18:39 mimeTypes.rdf >> -rw-r--r-- 1 rotem rotem 0 2009-07-05 18:40 .parentlock >> -rw-r--r-- 1 rotem rotem 2.0K 2009-07-01 16:10 permissions.sqlite >> -rw-r--r-- 1 rotem rotem 128K 2009-07-05 18:54 places.sqlite >> -rw------- 1 rotem rotem 951 2009-07-05 19:00 prefs.js >> -rw-r--r-- 1 rotem rotem 1.1M 2009-07-05 18:20 XPC.mfasl >> -rw-r--r-- 1 rotem rotem 98K 2009-07-05 18:19 xpti.dat >> -rw-r--r-- 1 rotem rotem 98K 2009-07-05 18:20 XUL.mfasl >> >> ./7680jxd5.default/31c26198d06ad265677b450796cc09aa.index: >> total 2.4G >> drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13 . >> drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 .. >> -rw-r--r-- 1 rotem rotem 0 2009-07-02 01:46 flintlock >> -rw-r--r-- 1 rotem rotem 12 2009-07-02 01:46 iamflint >> -rw-r--r-- 1 rotem rotem 22K 2009-07-02 05:13 position.baseA >> -rw-r--r-- 1 rotem rotem 21K 2009-07-02 05:10 position.baseB >> -rw-r--r-- 1 rotem rotem 1.4G 2009-07-02 05:13 position.DB >> -rw-r--r-- 1 rotem rotem 12K 2009-07-02 05:13 postlist.baseA >> -rw-r--r-- 1 rotem rotem 12K 2009-07-02 05:10 postlist.baseB >> -rw-r--r-- 1 rotem rotem 754M 2009-07-02 05:13 postlist.DB >> -rw-r--r-- 1 rotem rotem 70 2009-07-02 05:13 record.baseA >> -rw-r--r-- 1 rotem rotem 70 2009-07-02 05:10 record.baseB >> -rw-r--r-- 1 rotem rotem 3.3M 2009-07-02 05:13 record.DB >> -rw-r--r-- 1 rotem rotem 4.4K 2009-07-02 05:13 termlist.baseA >> -rw-r--r-- 1 rotem rotem 4.3K 2009-07-02 05:10 termlist.baseB >> -rw-r--r-- 1 rotem rotem 278M 2009-07-02 05:13 termlist.DB >> -rw-r--r-- 1 rotem rotem 232 2009-07-02 05:13 value.baseA >> -rw-r--r-- 1 rotem rotem 230 2009-07-02 05:10 value.baseB >> -rw-r--r-- 1 rotem rotem 14M 2009-07-02 05:13 value.DB >> >> ./7680jxd5.default/extensions: >> total 8.0K >> drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 . >> drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 .. >> ro...@desktop:~/.www.kiwix.org/kiwix$ >> >> ------------------------------ >> >> Message: 2 >> Date: Sun, 5 Jul 2009 20:57:39 +0200 >> From: Manuel Schneider <[email protected]> >> Subject: Re: [openZIM dev-l] Kiwix index size >> To: [email protected], [email protected] >> Message-ID: <[email protected]> >> Content-Type: text/plain; charset="utf-8" >> >> Hi Asaf, >> >> Am Sonntag, 5. Juli 2009 schrieb Asaf Bartov: >> > When running Kiwix's indexer on the ZIM file I had created from the >> Hebrew >> > Wikipedia last week, the Kiwix data directory ran up to a total of 31 >> > items, totalling 2.3 GB. The ZIM file itself is ~300MB. Does this >> > proportion make sense? >> >> I am not sure about the other files which were created, you only need the >> ZIM >> file with the index itself. >> >> For 900'000 articles the ZIM file containing the articles was 1.4 GB, the >> Index ZIM was 1.0 GB. >> >> So I think 300 MB looks fine. >> >> Greets, >> >> >> Manuel >> -- >> Regards >> Manuel Schneider >> >> Wikimedia CH - Verein zur F?rderung Freien Wissens >> Wikimedia CH - Association for the advancement of free knowledge >> www.wikimedia.ch >> >> >> ------------------------------ >> >> Message: 3 >> Date: Sun, 05 Jul 2009 21:05:33 +0200 >> From: Emmanuel Engelhart <[email protected]> >> Subject: Re: [openZIM dev-l] Kiwix index size >> To: [email protected], [email protected] >> Message-ID: <[email protected]> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Hi Asaf >> Asaf Bartov a ?crit : >> > When running Kiwix's indexer on the ZIM file I had created from the >> Hebrew >> > Wikipedia last week, the Kiwix data directory ran up to a total of 31 >> items, >> > totalling 2.3 GB. The ZIM file itself is ~300MB. Does this proportion >> make >> > sense? >> >> this is possible. Kiwix uses the Xapian search engine which generates >> pretty big index files. >> >> I have to questions: >> * Are the search results OK? >> * Do you have a problem with the size of the index? Do you have a size >> limit? >> >> They are many open search/index softwares. I choose to use Xapian for >> many reasons, but this is possible under certain condition to add to >> Kiwix the support to an another search engine. This should be also >> possible to make a modified version of the indexer using less disk space >> (but with less words indexed). >> >> OpenZIM itself provides a search solution, Tommi can explain you more >> about it. Maybe it would be interesting for you to test it and give us a >> feedback! >> >> Regards >> Emmanuel >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.9 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iEYEARECAAYFAkpQ+XcACgkQn3IpJRpNWtPm8wCfcmzwRfg6/9ttuknkURF7ct5I >> JLAAoLbVJWqXUKIeh8Mpua3GD+bjI5ZD >> =RH/U >> -----END PGP SIGNATURE----- >> >> >> ------------------------------ >> >> _______________________________________________ >> dev-l mailing list >> [email protected] >> https://intern.openzim.org/mailman/listinfo/dev-l >> >> >> End of dev-l Digest, Vol 5, Issue 2 >> *********************************** >> > > > -- > Rotem Simha > > _______________________________________________ > dev-l mailing list > [email protected] > https://intern.openzim.org/mailman/listinfo/dev-l > > -- -- Asaf Bartov <[email protected]>
_______________________________________________ dev-l mailing list [email protected] https://intern.openzim.org/mailman/listinfo/dev-l
