Hi Hady and Wencan!

On 15 March 2014 17:35, Hady elsahar <[email protected]> wrote:
> On Sun, Mar 16, 2014 at 12:43 AM, wencan luo <[email protected]> wrote:
>> 5. How can I debug an extractor? Testing on the whole Wikipedia dump is
>> impossible when debugging. It is too slow.
>
> Of course you need debugging, running the whole extraction for one dump only
> can take multiple hours.
> which IDE are you using?, i'm using IntelliJ you can simply put break points
> and debug.

I've found that you can download all the pages in a particular
category using http://commons.wikimedia.org/wiki/Special:Export, and
then set:
 source = exported-file.xml

Then the extraction framework will work on just the pages in the
exported file (as long as you have your directories set up correctly,
as Hady explains above!). I'm currently playing around with an export
of around a 1,600 'File' pages from the Commons, which takes about 12
seconds for the framework to run. I imagine that'll get slower once I
have the extraction framework actually extracts something useful from
those pages, but it could be a useful way of building a small dataset
to play around with.

cheers,
Gaurav

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to