I also have the same queries. I'm working on bengali DBpedia... Some body
please answeer...

On Tue, Nov 22, 2011 at 4:33 PM, Amit Kumar <[email protected]> wrote:

>
> Hey everyone,
> I’m trying to setup the Dbpedia extraction framework as I’m interested in
> getting structured data from already downloaded wikipedia dumps.  As per my
> understanding  I need to work in the ‘dump’ directory of the codebase. I
> have tried to reverse engineer ( given scala is new for me) but I need some
> help.
>
>
>    1. First of all, is there a more detailed documentation somewhere
>    about setting and running the pipeline. The one available on
>    dbpedia.org seems insufficient.
>    2. I understand that I need to create a config.properties file first
>    where I need to setup input/output locations, list of extractors and the
>    languages. I tried working with the config.properties.default given in the
>    code. There seems to be some typo in the extractor list.
>    ‘org.dbpedia.extraction.mappings.InterLanguageLinksExtractorExtractor’
>    using this gives ‘class not found’ error. I converted it to
>    ‘org.dbpedia.extraction.mappings.InterLanguageLinksExtractor’. Is it ok ?
>    3. I can’t find the documentation on how to setup the input directory.
>    Can someone tell me the details? From what I gather, input directory should
>    contain a ‘commons’ directory plus, directory for all languages set in
>    config.properties. All these directories must have a subdirectory whose
>    name should be of YYYYMMDD format. Within that you save the xml files such
>    as enwiki-20111111-pages-articles.xml. Am I right ? Does the framework work
>    on any particular dump of Wikipedia? Also what goes in the commons branch ?
>    4. I ran the framework by copying a sample dump
>    
> http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2in
>  both en and commons branch. Unzipping them and renaming as per
>    requirement. For now I’m working with en language only. It works with the
>    default 19 extractors but, starts failing if I include *
>    AbstractExtractor.* It seems the AbstractExtractor requires an
>    instance of Mediawiki running to parse mediawiki syntax. From the file
>    itself, “*DBpedia-customized MediaWiki instance is required*.” Can
>    someone shed some more light on this ? What customization is required ?
>    Where can I get one ?
>
>
>
> Sorry if the question are too basic and already mentioned somewhere. I
> have tried looking but couldn’t find myself.
> Also another question: Is there a reason for the delay in subsequent
> Dbpedia releases ? I was wondering , if the code is already there, why does
> it take 6 months between Dbpedia releases? Is there a manual editorial
>  involved or is it due  to development/changes  in the framework code which
> are collated in every release?
>
>
> Thanks and regards,
>
> Amit
> Tech Lead
> Cloud and Platform Group
> Yahoo!
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to