About the dead links, few thinks:
* Are you sure the problem is not at the source (HTML files)
* the zimwriter does not check if all links in all HTML pages are OK
* it seems that the libzim returns a bad content if the content does not exist 
(Tommi can you confirm?). Should returns nothing or an error code IMO.

Emmanuel

 Le lun 06/07/09 14:58, "Rotem Simha" [email protected] a écrit:
> * there are some errors in links of files and special pages
> examples
> קובץ:Nuvola_apps_important.svg [1] link to
> ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים
> ללא תמונות/קטגוריות/ספורטאים איטלקים
> (wikipedia:wikipedia projects articles without imagescategoriesSports
> people from Italy)
> מיוחד:אקראי (Special:Random) > 15 במאי (may 15)
> מיוחד:שינויים אחרונים (Special:RecentChanges) >
> 10_באוגוסט
> 
> * size is important because we intend to add images
> 
> 2009/7/6 
> Send dev-l mailing list submissions to
>        [email protected]
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://intern.openzim.org/mailman/listinfo/dev-l [2]
> or, via email, send a message with subject or body help to
>        [email protected]
> 
> You can reach the person managing the list at
>        [email protected]
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of dev-l digest..."
> 
> Todays Topics:
> 
>   1. Kiwix index size (Asaf Bartov)
>   2. Re: Kiwix index size (Manuel Schneider)
>   3. Re: Kiwix index size (Emmanuel Engelhart)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Sun, 5 Jul 2009 19:18:57 +0300
> From: Asaf Bartov 
> Subject: [openZIM dev-l] Kiwix index size
> To: [email protected]
> Message-ID:
>      
>  
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Hi, everyone.
> 
> When running Kiwixs indexer on the ZIM file I had created from the
> Hebrew
> Wikipedia last week, the Kiwix data directory ran up to a total of
> 31 items,
> totalling 2.3 GB.  The ZIM file itself is ~300MB.  Does this
> proportion make
> sense?
> 
> Detailed ls output attached.
> 
> Thanks in advance,
> 
>   Asaf Bartov
> --
> Asaf Bartov 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
>  tachment.html[3]>
> -------------- next part --------------
> ro...@desktop:~/.www.kiwix.org/kiwix$ [4] ls -l -h -a -R
> .:
> total 16K
> drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 .
> drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 ..
> drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 7680jxd5.default
> -rw-r--r-- 1 rotem rotem   94 2009-07-01 16:10 profiles.ini
> 
> ./7680jxd5.default:
> total 1.7M
> drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 .
> drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 ..
> drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13
> 31c26198d06ad265677b450796cc09aa.index
> -rw------- 1 rotem rotem  162 2009-07-05 18:19 compatibility.ini
> -rw-r--r-- 1 rotem rotem 135K 2009-07-05 18:19 compreg.dat
> drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 extensions
> -rw-r--r-- 1 rotem rotem  169 2009-07-01 16:10 localstore.rdf
> -rw-r--r-- 1 rotem rotem  304 2009-07-05 18:39 mimeTypes.rdf
> -rw-r--r-- 1 rotem rotem    0 2009-07-05 18:40 .parentlock
> -rw-r--r-- 1 rotem rotem 2.0K 2009-07-01 16:10 permissions.sqlite
> -rw-r--r-- 1 rotem rotem 128K 2009-07-05 18:54 places.sqlite
> -rw------- 1 rotem rotem  951 2009-07-05 19:00 prefs.js
> -rw-r--r-- 1 rotem rotem 1.1M 2009-07-05 18:20 XPC.mfasl
> -rw-r--r-- 1 rotem rotem  98K 2009-07-05 18:19 xpti.dat
> -rw-r--r-- 1 rotem rotem  98K 2009-07-05 18:20 XUL.mfasl
> 
> ./7680jxd5.default/31c26198d06ad265677b450796cc09aa.index:
> total 2.4G
> drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13 .
> drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 ..
> -rw-r--r-- 1 rotem rotem    0 2009-07-02 01:46 flintlock
> -rw-r--r-- 1 rotem rotem   12 2009-07-02 01:46 iamflint
> -rw-r--r-- 1 rotem rotem  22K 2009-07-02 05:13 position.baseA
> -rw-r--r-- 1 rotem rotem  21K 2009-07-02 05:10 position.baseB
> -rw-r--r-- 1 rotem rotem 1.4G 2009-07-02 05:13 position.DB
> -rw-r--r-- 1 rotem rotem  12K 2009-07-02 05:13 postlist.baseA
> -rw-r--r-- 1 rotem rotem  12K 2009-07-02 05:10 postlist.baseB
> -rw-r--r-- 1 rotem rotem 754M 2009-07-02 05:13 postlist.DB
> -rw-r--r-- 1 rotem rotem   70 2009-07-02 05:13 record.baseA
> -rw-r--r-- 1 rotem rotem   70 2009-07-02 05:10 record.baseB
> -rw-r--r-- 1 rotem rotem 3.3M 2009-07-02 05:13 record.DB
> -rw-r--r-- 1 rotem rotem 4.4K 2009-07-02 05:13 termlist.baseA
> -rw-r--r-- 1 rotem rotem 4.3K 2009-07-02 05:10 termlist.baseB
> -rw-r--r-- 1 rotem rotem 278M 2009-07-02 05:13 termlist.DB
> -rw-r--r-- 1 rotem rotem  232 2009-07-02 05:13 value.baseA
> -rw-r--r-- 1 rotem rotem  230 2009-07-02 05:10 value.baseB
> -rw-r--r-- 1 rotem rotem  14M 2009-07-02 05:13 value.DB
> 
> ./7680jxd5.default/extensions:
> total 8.0K
> drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 .
> drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 ..
> ro...@desktop:~/.www.kiwix.org/kiwix$ [5]
> 
> ------------------------------
> 
> Message: 2
> Date: Sun, 5 Jul 2009 20:57:39 +0200
> From: Manuel Schneider 
> Subject: Re: [openZIM dev-l] Kiwix index size
> To: [email protected], [email protected]
> Message-ID: 
> Content-Type: text/plain;  charset="utf-8"
> 
> Hi Asaf,
> 
> Am Sonntag, 5. Juli 2009 schrieb Asaf Bartov:
> > When running Kiwixs indexer on the ZIM file I had created from the
> Hebrew
> > Wikipedia last week, the Kiwix data directory ran up to a total of
> 31
> > items, totalling 2.3 GB.  The ZIM file itself is ~300MB.  Does
> this
> > proportion make sense?
> 
> I am not sure about the other files which were created, you only
> need the ZIM
> file with the index itself.
> 
> For 900000 articles the ZIM file containing the articles was 1.4 GB,
> the
> Index ZIM was 1.0 GB.
> 
> So I think 300 MB looks fine.
> 
> Greets,
> 
> Manuel
> --
> Regards
> Manuel Schneider
> 
> Wikimedia CH - Verein zur F?rderung Freien Wissens
> Wikimedia CH - Association for the advancement of free knowledge
> www.wikimedia.ch [6]
> 
> ------------------------------
> 
> Message: 3
> Date: Sun, 05 Jul 2009 21:05:33 +0200
> From: Emmanuel Engelhart 
> Subject: Re: [openZIM dev-l] Kiwix index size
> To: [email protected], [email protected]
> Message-ID: 
> Content-Type: text/plain; charset=ISO-8859-1
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi Asaf
> Asaf Bartov a ?crit :
> > When running Kiwixs indexer on the ZIM file I had created from the
> Hebrew
> > Wikipedia last week, the Kiwix data directory ran up to a total of
> 31 items,
> > totalling 2.3 GB.  The ZIM file itself is ~300MB.  Does this
> proportion make
> > sense?
> 
> this is possible. Kiwix uses the Xapian search engine which
> generates
> pretty big index files.
> 
> I have to questions:
> * Are the search results OK?
> * Do you have a problem with the size of the index? Do you have a
> size
> limit?
> 
> They are many open search/index softwares. I choose to use Xapian
> for
> many reasons, but this is possible under certain condition to add to
> Kiwix the support to an another search engine. This should be also
> possible to make a modified version of the indexer using less disk
> space
> (but with less words indexed).
> 
> OpenZIM itself provides a search solution, Tommi can explain you
> more
> about it. Maybe it would be interesting for you to test it and give
> us a
>  feedback!
> 
> Regards
> Emmanuel
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org [7]
> 
> iEYEARECAAYFAkpQ+XcACgkQn3IpJRpNWtPm8wCfcmzwRfg6/9ttuknkURF7ct5I
> JLAAoLbVJWqXUKIeh8Mpua3GD+bjI5ZD
> =RH/U
> -----END PGP SIGNATURE-----
> 
> ------------------------------
> 
> _______________________________________________
> dev-l mailing list
> [email protected]
> https://intern.openzim.org/mailman/listinfo/dev-l [8]
> 
> End of dev-l Digest, Vol 5, Issue 2
> ***********************************
> 
> -- 
> Rotem Simha
> 
> 
> 
> Links:
> ------
> [1] http://commons.wikimedia.org/wiki/File:Nuvola_apps_important.svg
> [2] https://intern.openzim.org/mailman/listinfo/dev-l
> [3]
> http://intern.openzim.org/pipermail/dev-l/attachments/20090705/2afee878/att
> achment.html[4] http://www.kiwix.org/kiwix$
> [5] http://www.kiwix.org/kiwix$
> [6] http://www.wikimedia.ch
> [7] http://enigmail.mozdev.org
> [8] https://intern.openzim.org/mailman/listinfo/dev-l
> 
> 

_______________________________________________
dev-l mailing list
[email protected]
https://intern.openzim.org/mailman/listinfo/dev-l

Reply via email to