-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Platonides wrote: > Farkas, Illes wrote: >> Dear All, >> >> Is the dump file containing the page abstracts for Yahoo produced by >> human or machines ? >> >> Thanks > > It's producesd by a machine, extracting the beginning of all articles > (which are human-created).
It's a machine attempting to pull the first two sentences of the article as plaintext, sometimes more successfully than others. :) I'm not sure these files are actually still being used, though. You can find the code in: http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/ActiveAbstract/ But I think the newer code here to pull the first sentence is more reliable (requires current MediaWiki with new parser): http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/OpenSearchXml/ - -- brion -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkklvocACgkQwRnhpk1wk458QgCfQythKEvXp9ssRsILQOejNQ09 bWoAn31APe3W773YkBTy2UuKOE2drQJ9 =MGM8 -----END PGP SIGNATURE----- _______________________________________________ MediaWiki-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
