Hi Marc Le mar 11/08/09 16:26, "Marc Bantle" [email protected] a écrit: > I observed that Kiwix is producing an "ad-hoc" type > index. This may be usefull for desktops as they have > the power to generate an index file on the fly. On > small footprint devices this will not reasonably be > possible, due to lacking memory and cpu resources.
Yes > Even on a dual core desktop with 3.5 GB of memory > Kiwix failed to produce "ad-hoc" index of the > openzim-edition of the German Wikipedia running > out of memory after many hours. Yes, this is Kiwix's specific issue with really big (with a lot of text) ZIM files. > Question 1: > >From the change log I see that kiwix is using > a prominent search engine (Xapian) instead of the > mechanism ZimReader/Writer are using. Is there an > easy way to reuse an index produced by Kiwix on > a different machines? Yes although this is not trivial, the Xapian database is in your ~/.www.kiwix.org directory ([md5sum].index directory) You can copy it to every other profile/account/computer and like that Kiwix will be able to search trough a ZIM without running the indexing process. To reduce the size of the directory, you can also use "xapian-compact". We know, they are a list of improvements to do to improve the current index management usability. > Question 2: > Are there plans to enable Kiwix to read reusable > indexes of the format released for ZimReader/ > Writer? I have nothing against to make Kiwix compatible with different search engine backends... but this is not a priority yet for me. I think I will do it in a middle far future, as soon as I have time for that or if for any reason a user really need that. > Question 3: > Are there plans to enable Kiwix to produce such a > reusable index. Not sure to understand the question? Do you speak from the ZIM indexes ? In this case, cf. Question2 comment. > Question 4: > Wouldn't it be desirable to deliver reusable indexes > together with zim-article-databases for all those > people with less capable devices (mids, netbooks, > phones) on the Kiwix site? I do not believe having only one type of search engine is good at all: usages are multiple and for this reason with have different search engines. I think the ZIM format should not forced the user to make a choice. I also think, we have to be able to spread contents without data twice (with indexes). And finaly, I do not think that having compatible indexes should be a priority because 99% of the users don't care about that (they simply use only one client). > Question 5: > The zim databases supplied on the Kiwix site [1] > seem to use the articles title field as article id field, > which - I'm sure - solves some problems for Kiwix, > but results in a list of article ids as result of a search > on zimreader instead of a list of article titles. Since > both Kiwix and ZimReader are part of the openzim > standardization effort, this confuses me a bit. Which > format is supposed to be the standard? Tommi already answered to that... IMO this is the job of the indexer to find the title... if there is a HTML page with a title, it has to use it. More globaly, IMO forcing ZIM creators with url=title is a bad idea, we never should forced (with the format) to adopt special way of representig/storing Informations. All what if possible with "normal" HTTP/HTML should be also possible with contents in a ZIM file. Emmanuel _______________________________________________ dev-l mailing list [email protected] https://intern.openzim.org/mailman/listinfo/dev-l
