Hi,

Currently the wikipedia dumps are stored in single zim file. Their size is already over the 2GB for the english wikipedia, and over 4GB for some versions with images included. Many devices don´t support files of that size, typically their file size limit is 2GB or 4GB. The 2GB limit is due to the use of signed 32-bit types in file access and unfortunately not that uncommon. For example Symbian2 (and earlier versions), the iostream in windows (see also http://bugs.openzim.org//show_bug.cgi?id=19), old linux versions, and
possibly Android [1] don´t support files larger than 2GB.
Others OS (including Symbian3, or Maemo) do support it, but still in many cases there is a 4GB limit due to FAT32 file system, which is the standard files system for SD cards, and also for internal memory of most mobile phones. Some of them, like Maemo or Android?, support use of other file systems which don´t have this limit, but this requires reformatting the memory card, and makes the card unreadable for many other devices. Therefore also for these cases another solution would make sense.

The question is how to support devices which have the 2GB or 4GB limit. The following options come to my mind: 1. Split files on file system level with special naming convention. (e.g. *.0.zim.*.1.zim etc..) The zim format is unchanged,
the zim library has to be extended to support this.
Advantage: Relative simple change to zimlib (only replace iostream implementation)
End user can split files relatively easily
Disadvantage: Not a really clean solution.
2. Split files in valid zim files with separate headers. Store in all zim files (e.g. in metatdata (relation?)) names of related files.
Advantage: Clean solution
Allows other features as well, e.g. separating images and text into separate files.
Disadvantage: Larger change to zim file format.
Possibly larger change to zimlib (or application if handled in metadata)
3. Split in valid zim files with separate headers. No changes to zim file format or zimlib, application using zimlib has to
load all related files (and find out which are related) appropriately.
Advantage: No change to zimlib
Disadvantage: Application has to handle this,
Difficult for end user to split file
Not convention how to detect related files (In worst case user has to open all separately)
=> Problematic if split file are to be provided.
4. Other ideas?

For all options it is possible to directly provide the split files (thus in future mediawiki would directly write out 2GB zim files)
, or to let the end user to do this.
I´d definitely prefer if split files are provided.

What is your opinion on this?
For the WikiOnBoard I´d have to solve this soon (in particular if the next german wiki zim is larger than 2GB ;) and therefore I´d need to implement something on my own if there is no agreed solution. However, I´d strongly prefer if there is an agreement for such a common solution.

Best regards,
Christian
[1] http://osdir.com/ml/android-porting/2010-03/msg00107.html
_______________________________________________
dev-l mailing list
[email protected]
https://intern.openzim.org/mailman/listinfo/dev-l

Reply via email to