Hi Andrew

On 09.03.2015 17:41, Andrew Bogott wrote:
Erik M just pointed out that there is a similar effort towards this goal
happening in production -- so maybe you can catch up with those folks
and see what you can contribute, or if you can get what you need from
their cluster?  I'm cc'ing Gabriel in hopes that he can collaborate with
you or refer you to the folks who are doing the actual work.

https://phabricator.wikimedia.org/T17017
https://phabricator.wikimedia.org/T91853

Gabriel proposes to pack in a ZIP file the raw Parsoid output. This is HTML+RDF only and not directly usable by end-users. The purpose of this is to provide a high quality input for researchers and software developers interested in the article texts.

ZIM files with Kiwix is a solution directly usable for end-users. The HTML is heavily rewritten, unpacking content is not necessary, this include optimized pictures, a fulltext search index, you have the software to read it, etc.

Mwoffliner is based on Parsoid output, so it's possible to consider Gabriel's proposition as an intermediary step to create ZIM files. I consider generating Parsoid output tarball as a "middle product" for creating ZIM files and both should be done in one run.

That said, even if creating Parsoid dumps needs less effort than for ZIM files, both are stuck currently due to a lack of HW resources.

Regards
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication

_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Reply via email to