Hello everyone, I have been working on building a prototype for the distributed extraction framework (GSoC project, proposal at [1]). You can find the GitHub repository at [2]. Among other things it currently contains distributed implementations of ConfigLoader and ExtractionJob, and a new launcher script called DistExtraction. The repo contains a parent POM and has children modules extraction-framework (from git master) and dist (the prototype code).
It is currently able to perform distributed redirect extraction and composite extraction (the main extractor stuff) using Spark. I've tested it with RedirectExtractor and liwiki, Spark 0.8.1. Please let me know what you think about it. :) Also, I have updated the GSoC project section, the final section, and the timeframes table section of my proposal [1]. [1] : https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2014/nileshc/5860723393560576 [2] : https://github.com/nilesh-c/dbpedia-dist-extraction Cheers, Nilesh -- A quest eternal, a life so small! So don't just play the guitar, build one. You can also email me at [email protected] or visit my website<http://www.nileshc.com/> ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech _______________________________________________ Dbpedia-gsoc mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
