This question keeps coming up, so I added hints to the documentation.
4.3. Running Abstract Extraction
http://wiki.dbpedia.org/Documentation#h25-8
Cheers,
Pablo
On Thu, Sep 13, 2012 at 7:13 AM, Dimitris Kontokostas <[email protected]>wrote:
> Hi Piotr,
>
> We will happily accept you patch :)
> You can take a look at [1] & [2] for more details on abstract extraction.
>
> Best,
> Dimitris
>
> [1]
> http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/d580c99b5bbc/core/src/main/scala/org/dbpedia/extraction/mappings/AbstractExtractor.scala#l66
> [2]
> http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/dbpedia/file/efc0afb0faa3/abstractExtraction/README.txt
>
>
> On Wed, Sep 12, 2012 at 10:37 PM, Piotr Jagielski
> <[email protected]>wrote:
>
>> Dimiris,
>>
>> I guess I'm confused about the project structure. I looked at
>> AbstractExtractor.scala. It clearly uses PageNode to figure out what the
>> abstract is and I figured out that PageNode is created by SimpleWikiParser.
>> I now see that there is some PHP code for a lot of stuff including abstract
>> extraction. I don't understand the relationship between Scala extraction
>> framework and PHP code and I'm wondering if you mean the latter when you
>> refer to "modified mediawiki installation". When I used
>> AbstractExtractor.scala to generate the abstract for
>> http://pl.dbpedia.org/page/Agnieszka_Rylik I got similar result because
>> of a strangely formatted template not parsed correctly.
>>
>> Anyway, I can now access the bug tracker so I will submit a patch there.
>>
>> Regards,
>> Piotr
>>
>>
>>
>> On 2012-09-11 08:39, Dimitris Kontokostas wrote:
>>
>> Hi Piotr,
>>
>> Any contribution is always welcome! However, the case you are referring
>> seems strange.
>> Abstracts are not generated by the SimpleWikiParser, they are produced by
>> a local wikipedia clone using a modified mediawiki installation.
>>
>> Best,
>> Dimitris
>>
>> On Mon, Sep 10, 2012 at 7:30 PM, Piotr Jagielski
>> <[email protected]>wrote:
>>
>>> Any thoughts on this? I wrote some test cases and a fix that I can
>>> contribute in case you are interested.
>>>
>>> Piotr
>>>
>>> On 2012-09-06 01:13, Piotr Jagielski wrote:
>>> > Hello,
>>> >
>>> > There is an issue with SimpleWikiParser in extraction framework
>>> > regarding template parsing. Strangely formatted templates like this
>>> one:
>>> > {{template | value |= }} are not parsed as templates nodes but text
>>> > nodes instead. Apart from preventing data extraction it results in
>>> > incorrect abstracts on Polish Dbpedia. For example on
>>> > http://pl.dbpedia.org/page/Agnieszka_Rylik the abstract contains
>>> infobox
>>> > parameter values.
>>> >
>>> > BTW, I noticed a couple of issues I when trying to report this issue.
>>> > 1) I couldn't submit a bug on SourceForge at
>>> > https://sourceforge.net/tracker/?group_id=190976&atid=935520. I got
>>> > permission denied error. Is there any reason to restrict bug reporting
>>> > to project members only?
>>> > 2) I wanted to created a test case for it but I couldn't find any tests
>>> > for the parser part in the repository. Are there any?
>>> >
>>> > Regards,
>>> > Piotr
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Live Security Virtual Conference
>>> > Exclusive live event will cover all the ways today's security and
>>> > threat landscape has changed and how IT managers can respond.
>>> Discussions
>>> > will include endpoint security, mobile security and the latest in
>>> malware
>>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> > _______________________________________________
>>> > Dbpedia-discussion mailing list
>>> > [email protected]
>>> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>> >
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>
>>
>>
>> --
>> Kontokostas Dimitris
>>
>>
>>
>
>
> --
> Kontokostas Dimitris
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
--
---
Pablo N. Mendes
http://pablomendes.com
Events: http://wole2012.eurecom.fr
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion