OK, I submitted a bug with proposed fix and test cases at
https://sourceforge.net/tracker/?func=detail&aid=3572779&group_id=190976&atid=935521.
Thanks for the link to documentation. Now I know where the confusion
came from. I should have mentioned that I tweaked the code locally a
little bit in order to generate abstracts without a local MediaWiki
instance :-) I used SimpleWikiParser to create PageNode to pass to
AbstractExctractor. The issue is in SimpleWikiParser.
Piotr
On 2012-09-13 11:51, Pablo N. Mendes wrote:
This question keeps coming up, so I added hints to the documentation.
4.3. Running Abstract Extraction
http://wiki.dbpedia.org/Documentation#h25-8
Cheers,
Pablo
On Thu, Sep 13, 2012 at 7:13 AM, Dimitris Kontokostas
<[email protected] <mailto:[email protected]>> wrote:
Hi Piotr,
We will happily accept you patch :)
You can take a look at [1] & [2] for more details on abstract
extraction.
Best,
Dimitris
[1]
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/d580c99b5bbc/core/src/main/scala/org/dbpedia/extraction/mappings/AbstractExtractor.scala#l66
[2]
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/dbpedia/file/efc0afb0faa3/abstractExtraction/README.txt
On Wed, Sep 12, 2012 at 10:37 PM, Piotr Jagielski
<[email protected] <mailto:[email protected]>> wrote:
Dimiris,
I guess I'm confused about the project structure. I looked at
AbstractExtractor.scala. It clearly uses PageNode to figure
out what the abstract is and I figured out that PageNode is
created by SimpleWikiParser. I now see that there is some PHP
code for a lot of stuff including abstract extraction. I don't
understand the relationship between Scala extraction framework
and PHP code and I'm wondering if you mean the latter when you
refer to "modified mediawiki installation". When I used
AbstractExtractor.scala to generate the abstract for
http://pl.dbpedia.org/page/Agnieszka_Rylik I got similar
result because of a strangely formatted template not parsed
correctly.
Anyway, I can now access the bug tracker so I will submit a
patch there.
Regards,
Piotr
On 2012-09-11 08:39, Dimitris Kontokostas wrote:
Hi Piotr,
Any contribution is always welcome! However, the case you are
referring seems strange.
Abstracts are not generated by the SimpleWikiParser, they are
produced by a local wikipedia clone using a modified
mediawiki installation.
Best,
Dimitris
On Mon, Sep 10, 2012 at 7:30 PM, Piotr Jagielski
<[email protected] <mailto:[email protected]>> wrote:
Any thoughts on this? I wrote some test cases and a fix
that I can
contribute in case you are interested.
Piotr
On 2012-09-06 01:13, Piotr Jagielski wrote:
> Hello,
>
> There is an issue with SimpleWikiParser in extraction
framework
> regarding template parsing. Strangely formatted
templates like this one:
> {{template | value |= }} are not parsed as templates
nodes but text
> nodes instead. Apart from preventing data extraction it
results in
> incorrect abstracts on Polish Dbpedia. For example on
> http://pl.dbpedia.org/page/Agnieszka_Rylik the abstract
contains infobox
> parameter values.
>
> BTW, I noticed a couple of issues I when trying to
report this issue.
> 1) I couldn't submit a bug on SourceForge at
>
https://sourceforge.net/tracker/?group_id=190976&atid=935520.
I got
> permission denied error. Is there any reason to
restrict bug reporting
> to project members only?
> 2) I wanted to created a test case for it but I
couldn't find any tests
> for the parser part in the repository. Are there any?
>
> Regards,
> Piotr
>
>
------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's
security and
> threat landscape has changed and how IT managers can
respond. Discussions
> will include endpoint security, mobile security and the
latest in malware
> threats.
http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
<mailto:[email protected]>
>
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's
security and
threat landscape has changed and how IT managers can
respond. Discussions
will include endpoint security, mobile security and the
latest in malware
threats.
http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
Kontokostas Dimitris
--
Kontokostas Dimitris
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond.
Discussions
will include endpoint security, mobile security and the latest in
malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
---
Pablo N. Mendes
http://pablomendes.com
Events: http://wole2012.eurecom.fr <http://wole2012.eurecom.fr/>
------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion