Re: [Wikitech-l] Crawling deWP

2009-01-28 Thread Rolf Lampa
Marco Schuster skrev: Rolf Lampa wrote: Doesn't the xml dumps contain the flag for flagged revs? The xml dumps are nothing for me, way too much overhead (especially, they are old, and I want to use single files, it's easier to process these than one hge xml file). And they don't

Re: [Wikitech-l] Crawling deWP

2009-01-28 Thread Platonides
Daniel Kinzler wrote: Rolf Lampa schrieb: I'd love, however, to see the flagged rev status as an attribute in one of the tags, for example revision flagged_rev=true Regards, Naw, it's more complex than that. You can have any number of different flags. It would probably have to be

Re: [Wikitech-l] Crawling deWP

2009-01-28 Thread Thomas Dalton
2009/1/28 Platonides platoni...@gmail.com: Daniel Kinzler wrote: Rolf Lampa schrieb: I'd love, however, to see the flagged rev status as an attribute in one of the tags, for example revision flagged_rev=true Regards, Naw, it's more complex than that. You can have any number of different

[Wikitech-l] Crawling deWP

2009-01-27 Thread Marco Schuster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I want to crawl around 800.000 flagged revisions from the German Wikipedia, in order to make a dump containing only flagged revisions. For this, I obviously need to spider Wikipedia. What are the limits (rate!) here, what UA should I use and

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Rolf Lampa
Marco Schuster skrev: I want to crawl around 800.000 flagged revisions from the German Wikipedia, in order to make a dump containing only flagged revisions. [...] flaggedpages where fp_reviewed=1;. Is it correct this one gives me a list of all articles with flagged revs, Doesn't the xml

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Daniel Kinzler
Rolf Lampa schrieb: Marco Schuster skrev: I want to crawl around 800.000 flagged revisions from the German Wikipedia, in order to make a dump containing only flagged revisions. [...] flaggedpages where fp_reviewed=1;. Is it correct this one gives me a list of all articles with flagged revs,

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Platonides
Marco Schuster wrote: Hi all, I want to crawl around 800.000 flagged revisions from the German Wikipedia, in order to make a dump containing only flagged revisions. For this, I obviously need to spider Wikipedia. What are the limits (rate!) here, what UA should I use and what caveats do I

Re: [Wikitech-l] Crawling deWP

2009-01-27 Thread Marco Schuster
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Jan 28, 2009 at 12:49 AM, Rolf Lampa wrote: Marco Schuster skrev: I want to crawl around 800.000 flagged revisions from the German Wikipedia, in order to make a dump containing only flagged revisions. [...] flaggedpages where