I also have the same queries. I'm working on bengali DBpedia... Some body
please answeer...
On Tue, Nov 22, 2011 at 4:33 PM, Amit Kumar <[email protected]> wrote:
>
> Hey everyone,
> I’m trying to setup the Dbpedia extraction framework as I’m interested in
> getting structured data from already downloaded wikipedia dumps. As per my
> understanding I need to work in the ‘dump’ directory of the codebase. I
> have tried to reverse engineer ( given scala is new for me) but I need some
> help.
>
>
> 1. First of all, is there a more detailed documentation somewhere
> about setting and running the pipeline. The one available on
> dbpedia.org seems insufficient.
> 2. I understand that I need to create a config.properties file first
> where I need to setup input/output locations, list of extractors and the
> languages. I tried working with the config.properties.default given in the
> code. There seems to be some typo in the extractor list.
> ‘org.dbpedia.extraction.mappings.InterLanguageLinksExtractorExtractor’
> using this gives ‘class not found’ error. I converted it to
> ‘org.dbpedia.extraction.mappings.InterLanguageLinksExtractor’. Is it ok ?
> 3. I can’t find the documentation on how to setup the input directory.
> Can someone tell me the details? From what I gather, input directory should
> contain a ‘commons’ directory plus, directory for all languages set in
> config.properties. All these directories must have a subdirectory whose
> name should be of YYYYMMDD format. Within that you save the xml files such
> as enwiki-20111111-pages-articles.xml. Am I right ? Does the framework work
> on any particular dump of Wikipedia? Also what goes in the commons branch ?
> 4. I ran the framework by copying a sample dump
>
> http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2in
> both en and commons branch. Unzipping them and renaming as per
> requirement. For now I’m working with en language only. It works with the
> default 19 extractors but, starts failing if I include *
> AbstractExtractor.* It seems the AbstractExtractor requires an
> instance of Mediawiki running to parse mediawiki syntax. From the file
> itself, “*DBpedia-customized MediaWiki instance is required*.” Can
> someone shed some more light on this ? What customization is required ?
> Where can I get one ?
>
>
>
> Sorry if the question are too basic and already mentioned somewhere. I
> have tried looking but couldn’t find myself.
> Also another question: Is there a reason for the delay in subsequent
> Dbpedia releases ? I was wondering , if the code is already there, why does
> it take 6 months between Dbpedia releases? Is there a manual editorial
> involved or is it due to development/changes in the framework code which
> are collated in every release?
>
>
> Thanks and regards,
>
> Amit
> Tech Lead
> Cloud and Platform Group
> Yahoo!
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion