Dimitris,
Cool!
Maybe we could test Sweble first as the new AbstractExtractor, since it
seems to be the weakest link? If it works for that, then it could be
gradually introduced in the core to substitute SimpleWikiParser.
Alessio, if you take the challenge, please keep us updated about your
progress on [email protected]
(btw, let's move this discussion there?)
Cheers,
Pablo
On Tue, Sep 27, 2011 at 1:19 PM, Dimitris Kontokostas <[email protected]>wrote:
> Some more info for the (current) abstract extraction process...
> You will have to install a local modified mediawiki and load the
> wikipedia dumps (after you clean then with the script)
> The detailed process is described here:
>
> http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/dbpedia/file/945c24bdc54c/abstractExtraction
>
> > I will also dare to give another idea. The guys behind Sweble
> > (http://sweble.org/) claim it is very thorough, and there seems to be a
> lot
> > of activity behind it.
>
> This could be a new approach to the framework, not only for abstracts,
> but to replace the SimpleWikiParser.
> I think the current parser is LL and maybe we could change to an LR
> Parser to handle better recursive syntax.
> I haven't checked at sweble yet, but we could look into it
>
> Cheers,
> Dimitris
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion