OK, I submitted a bug with proposed fix and test cases at https://sourceforge.net/tracker/?func=detail&aid=3572779&group_id=190976&atid=935521.

Thanks for the link to documentation. Now I know where the confusion came from. I should have mentioned that I tweaked the code locally a little bit in order to generate abstracts without a local MediaWiki instance :-) I used SimpleWikiParser to create PageNode to pass to AbstractExctractor. The issue is in SimpleWikiParser.

Piotr

On 2012-09-13 11:51, Pablo N. Mendes wrote:

This question keeps coming up, so I added hints to the documentation.

4.3. Running Abstract Extraction
http://wiki.dbpedia.org/Documentation#h25-8

Cheers,
Pablo

On Thu, Sep 13, 2012 at 7:13 AM, Dimitris Kontokostas <[email protected] <mailto:[email protected]>> wrote:

    Hi Piotr,

    We will happily accept you patch :)
    You can take a look at [1] & [2] for more details on abstract
    extraction.

    Best,
    Dimitris

    [1]
    
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/d580c99b5bbc/core/src/main/scala/org/dbpedia/extraction/mappings/AbstractExtractor.scala#l66
    [2]
    
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/dbpedia/file/efc0afb0faa3/abstractExtraction/README.txt



    On Wed, Sep 12, 2012 at 10:37 PM, Piotr Jagielski
    <[email protected] <mailto:[email protected]>> wrote:

        Dimiris,
        I guess I'm confused about the project structure. I looked at
        AbstractExtractor.scala. It clearly uses PageNode to figure
        out what the abstract is and I figured out that PageNode is
        created by SimpleWikiParser. I now see that there is some PHP
        code for a lot of stuff including abstract extraction. I don't
        understand the relationship between Scala extraction framework
        and PHP code and I'm wondering if you mean the latter when you
        refer to "modified mediawiki installation". When I used
        AbstractExtractor.scala to generate the abstract for
        http://pl.dbpedia.org/page/Agnieszka_Rylik I got similar
        result because of a strangely formatted template not parsed
        correctly.

        Anyway, I can now access the bug tracker so I will submit a
        patch there.
        Regards,
        Piotr



        On 2012-09-11 08:39, Dimitris Kontokostas wrote:
        Hi Piotr,

        Any contribution is always welcome! However, the case you are
        referring seems strange.
        Abstracts are not generated by the SimpleWikiParser, they are
        produced by a local wikipedia clone using a modified
        mediawiki installation.

        Best,
        Dimitris

        On Mon, Sep 10, 2012 at 7:30 PM, Piotr Jagielski
        <[email protected] <mailto:[email protected]>> wrote:

            Any thoughts on this? I wrote some test cases and a fix
            that I can
            contribute in case you are interested.

            Piotr

            On 2012-09-06 01:13, Piotr Jagielski wrote:
            > Hello,
            >
            > There is an issue with SimpleWikiParser in extraction
            framework
            > regarding template parsing. Strangely formatted
            templates like this one:
            > {{template | value |= }} are not parsed as templates
            nodes but text
            > nodes instead. Apart from preventing data extraction it
            results in
            > incorrect abstracts on Polish Dbpedia. For example on
            > http://pl.dbpedia.org/page/Agnieszka_Rylik the abstract
            contains infobox
            > parameter values.
            >
            > BTW, I noticed a couple of issues I when trying to
            report this issue.
            > 1) I couldn't submit a bug on SourceForge at
            >
            https://sourceforge.net/tracker/?group_id=190976&atid=935520.
            I got
            > permission denied error. Is there any reason to
            restrict bug reporting
            > to project members only?
            > 2) I wanted to created a test case for it but I
            couldn't find any tests
            > for the parser part in the repository. Are there any?
            >
            > Regards,
            > Piotr
            >
            >
            
------------------------------------------------------------------------------
            > Live Security Virtual Conference
            > Exclusive live event will cover all the ways today's
            security and
            > threat landscape has changed and how IT managers can
            respond. Discussions
            > will include endpoint security, mobile security and the
            latest in malware
            > threats.
            http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
            > _______________________________________________
            > Dbpedia-discussion mailing list
            > [email protected]
            <mailto:[email protected]>
            >
            https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
            >


            
------------------------------------------------------------------------------
            Live Security Virtual Conference
            Exclusive live event will cover all the ways today's
            security and
            threat landscape has changed and how IT managers can
            respond. Discussions
            will include endpoint security, mobile security and the
            latest in malware
            threats.
            http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
            _______________________________________________
            Dbpedia-discussion mailing list
            [email protected]
            <mailto:[email protected]>
            https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion




-- Kontokostas Dimitris




-- Kontokostas Dimitris

    
------------------------------------------------------------------------------
    Live Security Virtual Conference
    Exclusive live event will cover all the ways today's security and
    threat landscape has changed and how IT managers can respond.
    Discussions
    will include endpoint security, mobile security and the latest in
    malware
    threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
    _______________________________________________
    Dbpedia-discussion mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion




--
---
Pablo N. Mendes
http://pablomendes.com
Events: http://wole2012.eurecom.fr <http://wole2012.eurecom.fr/>


------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to