I have answered to Rotem about the links. I have also open a bug on the Kiwix side: https://sourceforge.net/tracker/?func=detail&aid=2817440&group_id=175508&atid=873515
For the search engine index size, we have to search a solution with a smaller index. Starting with the openzim solution should be good. I will have a look during this week. Emmanuel Le lun 06/07/09 15:03, "Asaf Bartov" [email protected] a écrit: > Clarification: > > This last message was by Rotem, a fellow WM-IL member helping me with > the embedding of the Hebrew Wikipedia in the One Computer Per Child > project. > > He is reporting issues with Kiwix and the ZIM file I created last > week. > > Regarding size: Size is important, because we intend to add images > (the 300MB ZIM file is the complete Hebrew Wikipedia text, but no > pictures). We are hoping to have at least 5GB reserved for us in > those One Computer Per Child machines we are to install on, but we may > be forced to make do with 3GB. So every MB saved from the index, is > another MB available for images... > > Asaf Bartov > Wikimedia Israel > > On Mon, Jul 6, 2009 at 3:58 PM, Rotem Simha wrote: > * there are some errors in links of files and special pages > examples > קובץ:Nuvola_apps_important.svg [1] link to > ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים > ללא תמונות/קטגוריות/ספורטאים איטלקים > (wikipedia:wikipedia projects articles without imagescategoriesSports > people from Italy) > מיוחד:אקראי (Special:Random) > 15 במאי (may 15) > מיוחד:שינויים אחרונים (Special:RecentChanges) > > 10_באוגוסט > > * size is important because we intend to add images > > 2009/7/6 > Send dev-l mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > https://intern.openzim.org/mailman/listinfo/dev-l [2] > or, via email, send a message with subject or body help to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of dev-l digest..." > > Todays Topics: > > 1. Kiwix index size (Asaf Bartov) > 2. Re: Kiwix index size (Manuel Schneider) > 3. Re: Kiwix index size (Emmanuel Engelhart) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 5 Jul 2009 19:18:57 +0300 > From: Asaf Bartov > Subject: [openZIM dev-l] Kiwix index size > To: [email protected] > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Hi, everyone. > > When running Kiwixs indexer on the ZIM file I had created from the > Hebrew > Wikipedia last week, the Kiwix data directory ran up to a total of > 31 items, > totalling 2.3 GB. The ZIM file itself is ~300MB. Does this > proportion make > sense? > > Detailed ls output attached. > > Thanks in advance, > > Asaf Bartov > -- > Asaf Bartov > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > tachment.html[3]> > -------------- next part -------------- > ro...@desktop:~/.www.kiwix.org/kiwix$ [4] ls -l -h -a -R > .: > total 16K > drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 . > drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 .. > drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 7680jxd5.default > -rw-r--r-- 1 rotem rotem 94 2009-07-01 16:10 profiles.ini > > ./7680jxd5.default: > total 1.7M > drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 . > drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 .. > drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13 > 31c26198d06ad265677b450796cc09aa.index > -rw------- 1 rotem rotem 162 2009-07-05 18:19 compatibility.ini > -rw-r--r-- 1 rotem rotem 135K 2009-07-05 18:19 compreg.dat > drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 extensions > -rw-r--r-- 1 rotem rotem 169 2009-07-01 16:10 localstore.rdf > -rw-r--r-- 1 rotem rotem 304 2009-07-05 18:39 mimeTypes.rdf > -rw-r--r-- 1 rotem rotem 0 2009-07-05 18:40 .parentlock > -rw-r--r-- 1 rotem rotem 2.0K 2009-07-01 16:10 permissions.sqlite > -rw-r--r-- 1 rotem rotem 128K 2009-07-05 18:54 places.sqlite > -rw------- 1 rotem rotem 951 2009-07-05 19:00 prefs.js > -rw-r--r-- 1 rotem rotem 1.1M 2009-07-05 18:20 XPC.mfasl > -rw-r--r-- 1 rotem rotem 98K 2009-07-05 18:19 xpti.dat > -rw-r--r-- 1 rotem rotem 98K 2009-07-05 18:20 XUL.mfasl > > ./7680jxd5.default/31c26198d06ad265677b450796cc09aa.index: > total 2.4G > drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13 . > drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 .. > -rw-r--r-- 1 rotem rotem 0 2009-07-02 01:46 flintlock > -rw-r--r-- 1 rotem rotem 12 2009-07-02 01:46 iamflint > -rw-r--r-- 1 rotem rotem 22K 2009-07-02 05:13 position.baseA > -rw-r--r-- 1 rotem rotem 21K 2009-07-02 05:10 position.baseB > -rw-r--r-- 1 rotem rotem 1.4G 2009-07-02 05:13 position.DB > -rw-r--r-- 1 rotem rotem 12K 2009-07-02 05:13 postlist.baseA > -rw-r--r-- 1 rotem rotem 12K 2009-07-02 05:10 postlist.baseB > -rw-r--r-- 1 rotem rotem 754M 2009-07-02 05:13 postlist.DB > -rw-r--r-- 1 rotem rotem 70 2009-07-02 05:13 record.baseA > -rw-r--r-- 1 rotem rotem 70 2009-07-02 05:10 record.baseB > -rw-r--r-- 1 rotem rotem 3.3M 2009-07-02 05:13 record.DB > -rw-r--r-- 1 rotem rotem 4.4K 2009-07-02 05:13 termlist.baseA > -rw-r--r-- 1 rotem rotem 4.3K 2009-07-02 05:10 termlist.baseB > -rw-r--r-- 1 rotem rotem 278M 2009-07-02 05:13 termlist.DB > -rw-r--r-- 1 rotem rotem 232 2009-07-02 05:13 value.baseA > -rw-r--r-- 1 rotem rotem 230 2009-07-02 05:10 value.baseB > -rw-r--r-- 1 rotem rotem 14M 2009-07-02 05:13 value.DB > > ./7680jxd5.default/extensions: > total 8.0K > drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 . > drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 .. > ro...@desktop:~/.www.kiwix.org/kiwix$ [5] > > ------------------------------ > > Message: 2 > Date: Sun, 5 Jul 2009 20:57:39 +0200 > From: Manuel Schneider > Subject: Re: [openZIM dev-l] Kiwix index size > To: [email protected], [email protected] > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Hi Asaf, > > Am Sonntag, 5. Juli 2009 schrieb Asaf Bartov: > > When running Kiwixs indexer on the ZIM file I had created from the > Hebrew > > Wikipedia last week, the Kiwix data directory ran up to a total of > 31 > > items, totalling 2.3 GB. The ZIM file itself is ~300MB. Does > this > > proportion make sense? > > I am not sure about the other files which were created, you only > need the ZIM > file with the index itself. > > For 900000 articles the ZIM file containing the articles was 1.4 GB, > the > Index ZIM was 1.0 GB. > > So I think 300 MB looks fine. > > Greets, > > Manuel > -- > Regards > Manuel Schneider > > Wikimedia CH - Verein zur F?rderung Freien Wissens > Wikimedia CH - Association for the advancement of free knowledge > www.wikimedia.ch [6] > > ------------------------------ > > Message: 3 > Date: Sun, 05 Jul 2009 21:05:33 +0200 > From: Emmanuel Engelhart > Subject: Re: [openZIM dev-l] Kiwix index size > To: [email protected], [email protected] > Message-ID: > Content-Type: text/plain; charset=ISO-8859-1 > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi Asaf > Asaf Bartov a ?crit : > > When running Kiwixs indexer on the ZIM file I had created from the > Hebrew > > Wikipedia last week, the Kiwix data directory ran up to a total of > 31 items, > > totalling 2.3 GB. The ZIM file itself is ~300MB. Does this > proportion make > > sense? > > this is possible. Kiwix uses the Xapian search engine which > generates > pretty big index files. > > I have to questions: > * Are the search results OK? > * Do you have a problem with the size of the index? Do you have a > size > limit? > > They are many open search/index softwares. I choose to use Xapian > for > many reasons, but this is possible under certain condition to add to > Kiwix the support to an another search engine. This should be also > possible to make a modified version of the indexer using less disk > space > (but with less words indexed). > > OpenZIM itself provides a search solution, Tommi can explain you > more > about it. Maybe it would be interesting for you to test it and give > us a > feedback! > > Regards > Emmanuel > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org [7] > > iEYEARECAAYFAkpQ+XcACgkQn3IpJRpNWtPm8wCfcmzwRfg6/9ttuknkURF7ct5I > JLAAoLbVJWqXUKIeh8Mpua3GD+bjI5ZD > =RH/U > -----END PGP SIGNATURE----- > > ------------------------------ > > _______________________________________________ > dev-l mailing list > [email protected] > https://intern.openzim.org/mailman/listinfo/dev-l [8] > > End of dev-l Digest, Vol 5, Issue 2 > *********************************** > > -- > Rotem Simha > > _______________________________________________ > dev-l mailing list > [email protected] > https://intern.openzim.org/mailman/listinfo/dev-l [9] > > -- > -- > Asaf Bartov > > > > Links: > ------ > [1] http://commons.wikimedia.org/wiki/File:Nuvola_apps_important.svg > [2] https://intern.openzim.org/mailman/listinfo/dev-l > [3] > http://intern.openzim.org/pipermail/dev-l/attachments/20090705/2afee878/att > achment.html[4] http://www.kiwix.org/kiwix$ > [5] http://www.kiwix.org/kiwix$ > [6] http://www.wikimedia.ch > [7] http://enigmail.mozdev.org > [8] https://intern.openzim.org/mailman/listinfo/dev-l > [9] https://intern.openzim.org/mailman/listinfo/dev-l > > _______________________________________________ dev-l mailing list [email protected] https://intern.openzim.org/mailman/listinfo/dev-l
