Hi Omid, > > (1) Why does this process involve a MySQL database? The DBpedia Scripts wont read the xml Files. All Data from the Wikipedia Dumps should be loaded in a MySQL Database first. You can do this using the import.php Script in /importwiki.
For en Wikipedia you may start this with eg.: php import.php -c -d DOWNLOADPATH -ip 127.0.0.1 en DBHOST DBNAME DBUSER DBPASSWORD -ip is your machine ip (helps if mwdumper throws exceptions as i remember) -you can find these parameters calling php import.php in /importwiki Since this is done youll have your Wikipedia Database on your Machine and you can start your first extraction. (The import script downloads, unzips and writes the Dump to Database) - will need some disk space ;) > (2) As my first project I want to improve on the abstract extractor > (dbpedia/extraction/extractors/ShortAbstractExtractor.php). I do not > want to generate anything but "articles_abstract_en.nt", so I want to > disable everything except this particular module. How do I do this? I > don't want all other components to run and take time. First you have to rename the databaseconfig.php.dist to databaseconfig.php and put your Database Parameters in this file. For starting an extraction I used the start.php ... just comment out the extractors you wont need. Extracting all Datasets should be done by extract.php (I just copied out the Code for Shortabstracts on End of this Mail) So i hope this helps. It has been a while since i used the Framework so it could be, that i forgot anything. Just let me know if its not running. Jörg start.php: function __autoload($class_name) { if(preg_match('~^.*Extractor.*$~',$class_name)) require_once ('extractors/'.$class_name.'.php'); else if(preg_match('~^.*Destination.*$~',$class_name)) require_once ('destinations/'.$class_name.'.php'); else require_once $class_name . '.php'; } $pageTitles = array("Google"); //will extract the Google Article - for all articles see original start.php //Create a Extraction Job $job = new ExtractionJob( new DatabaseWikipedia("en"), $pageTitles); // Create ExtractionGroups for each Extractors $groupShortAbstracts = new ExtractionGroup(new SimpleDumpDestination()); //SimpleDumpDestination will Output to Screen $groupShortAbstracts->addExtractor(new ShortAbstractExtractor()); $job->addExtractionGroup($groupShortAbstracts); //Execute the Extraction Job $manager = new ExtractionManager(); $manager->execute($job); ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion