What do you mean by text quality? The text itself is as good as the first couple of sentences in the Wikipedia article you take it from, right?

Piotr

On 2012-10-02 22:49, Dimitris Kontokostas wrote:
Our main interest is the text quality, if we get this right the shortening / tweaking should be the easy part :)

Could you please give us with some text quality feedback and if it is good maybe we can start testing it to other languages as well

Best,
Dimitris

On Tue, Oct 2, 2012 at 11:11 PM, Piotr Jagielski <[email protected] <mailto:[email protected]>> wrote:

    I haven't done extensive tests but one thing to improve for sure
    is the abstract shortening algorithm. You currently use a simple
    regex to solve a complex problem of breaking down natural language
    text into sentences. java.text.BreakIterator yields better results
    and is also locale sensitive. You might also want to take a look
    at more advanced boundary analysis library at
    http://userguide.icu-project.org/boundaryanalysis.

    Regards,
    Piotr


    On 2012-10-01 07:42, Dimitris Kontokostas wrote:
    Hi Piotr,

    Thank you for the patch, Although it catches an error case, it
    seems safe to be included in the framework.
    About the PageNode Abstracts, can you give us a quality feedback?
    It is something we always wanted to test but couldn't find the time.

    Best,
    Dimitris

    On Fri, Sep 28, 2012 at 5:57 PM, Piotr Jagielski
    <[email protected] <mailto:[email protected]>> wrote:

        OK, I submitted a bug with proposed fix and test cases at
        
https://sourceforge.net/tracker/?func=detail&aid=3572779&group_id=190976&atid=935521.

        Thanks for the link to documentation. Now I know where the
        confusion came from. I should have mentioned that I tweaked
        the code locally a little bit in order to generate abstracts
        without a local MediaWiki instance :-) I used
        SimpleWikiParser to create PageNode to pass to
        AbstractExctractor. The issue is in SimpleWikiParser.

        Piotr


        On 2012-09-13 11:51, Pablo N. Mendes wrote:

        This question keeps coming up, so I added hints to the
        documentation.

        4.3. Running Abstract Extraction
        http://wiki.dbpedia.org/Documentation#h25-8

        Cheers,
        Pablo

        On Thu, Sep 13, 2012 at 7:13 AM, Dimitris Kontokostas
        <[email protected] <mailto:[email protected]>> wrote:

            Hi Piotr,

            We will happily accept you patch :)
            You can take a look at [1] & [2] for more details on
            abstract extraction.

            Best,
            Dimitris

            [1]
            
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/d580c99b5bbc/core/src/main/scala/org/dbpedia/extraction/mappings/AbstractExtractor.scala#l66
            [2]
            
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/dbpedia/file/efc0afb0faa3/abstractExtraction/README.txt



            On Wed, Sep 12, 2012 at 10:37 PM, Piotr Jagielski
            <[email protected] <mailto:[email protected]>>
            wrote:

                Dimiris,
                I guess I'm confused about the project structure. I
                looked at AbstractExtractor.scala. It clearly uses
                PageNode to figure out what the abstract is and I
                figured out that PageNode is created by
                SimpleWikiParser. I now see that there is some PHP
                code for a lot of stuff including abstract
                extraction. I don't understand the relationship
                between Scala extraction framework and PHP code and
                I'm wondering if you mean the latter when you refer
                to "modified mediawiki installation". When I used
                AbstractExtractor.scala to generate the abstract for
                http://pl.dbpedia.org/page/Agnieszka_Rylik I got
                similar result because of a strangely formatted
                template not parsed correctly.

                Anyway, I can now access the bug tracker so I will
                submit a patch there.
                Regards,
                Piotr



                On 2012-09-11 08:39, Dimitris Kontokostas wrote:
                Hi Piotr,

                Any contribution is always welcome! However, the
                case you are referring seems strange.
                Abstracts are not generated by the
                SimpleWikiParser, they are produced by a local
                wikipedia clone using a modified mediawiki
                installation.

                Best,
                Dimitris

                On Mon, Sep 10, 2012 at 7:30 PM, Piotr Jagielski
                <[email protected]
                <mailto:[email protected]>> wrote:

                    Any thoughts on this? I wrote some test cases
                    and a fix that I can
                    contribute in case you are interested.

                    Piotr

                    On 2012-09-06 01:13, Piotr Jagielski wrote:
                    > Hello,
                    >
                    > There is an issue with SimpleWikiParser in
                    extraction framework
                    > regarding template parsing. Strangely
                    formatted templates like this one:
                    > {{template | value |= }} are not parsed as
                    templates nodes but text
                    > nodes instead. Apart from preventing data
                    extraction it results in
                    > incorrect abstracts on Polish Dbpedia. For
                    example on
                    > http://pl.dbpedia.org/page/Agnieszka_Rylik
                    the abstract contains infobox
                    > parameter values.
                    >
                    > BTW, I noticed a couple of issues I when
                    trying to report this issue.
                    > 1) I couldn't submit a bug on SourceForge at
                    >
                    
https://sourceforge.net/tracker/?group_id=190976&atid=935520.
                    I got
                    > permission denied error. Is there any reason
                    to restrict bug reporting
                    > to project members only?
                    > 2) I wanted to created a test case for it but
                    I couldn't find any tests
                    > for the parser part in the repository. Are
                    there any?
                    >
                    > Regards,
                    > Piotr
                    >
                    >
                    
------------------------------------------------------------------------------
                    > Live Security Virtual Conference
                    > Exclusive live event will cover all the ways
                    today's security and
                    > threat landscape has changed and how IT
                    managers can respond. Discussions
                    > will include endpoint security, mobile
                    security and the latest in malware
                    > threats.
                    http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
                    > _______________________________________________
                    > Dbpedia-discussion mailing list
                    > [email protected]
                    <mailto:[email protected]>
                    >
                    
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
                    >


                    
------------------------------------------------------------------------------
                    Live Security Virtual Conference
                    Exclusive live event will cover all the ways
                    today's security and
                    threat landscape has changed and how IT
                    managers can respond. Discussions
                    will include endpoint security, mobile security
                    and the latest in malware
                    threats.
                    http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
                    _______________________________________________
                    Dbpedia-discussion mailing list
                    [email protected]
                    <mailto:[email protected]>
                    
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion




-- Kontokostas Dimitris




-- Kontokostas Dimitris

            
------------------------------------------------------------------------------
            Live Security Virtual Conference
            Exclusive live event will cover all the ways today's
            security and
            threat landscape has changed and how IT managers can
            respond. Discussions
            will include endpoint security, mobile security and the
            latest in malware
            threats.
            http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
            _______________________________________________
            Dbpedia-discussion mailing list
            [email protected]
            <mailto:[email protected]>
            https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion




-- ---
        Pablo N. Mendes
        http://pablomendes.com
        Events: http://wole2012.eurecom.fr <http://wole2012.eurecom.fr/>





-- Kontokostas Dimitris




--
Kontokostas Dimitris

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to