Hi Hady and Wencan! On 15 March 2014 17:35, Hady elsahar <[email protected]> wrote: > On Sun, Mar 16, 2014 at 12:43 AM, wencan luo <[email protected]> wrote: >> 5. How can I debug an extractor? Testing on the whole Wikipedia dump is >> impossible when debugging. It is too slow. > > Of course you need debugging, running the whole extraction for one dump only > can take multiple hours. > which IDE are you using?, i'm using IntelliJ you can simply put break points > and debug.
I've found that you can download all the pages in a particular category using http://commons.wikimedia.org/wiki/Special:Export, and then set: source = exported-file.xml Then the extraction framework will work on just the pages in the exported file (as long as you have your directories set up correctly, as Hady explains above!). I'm currently playing around with an export of around a 1,600 'File' pages from the Commons, which takes about 12 seconds for the framework to run. I imagine that'll get slower once I have the extraction framework actually extracts something useful from those pages, but it could be a useful way of building a small dataset to play around with. cheers, Gaurav ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ Dbpedia-gsoc mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
