[
https://issues.apache.org/jira/browse/ANY23-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918011#comment-13918011
]
Lev Khomich edited comment on ANY23-137 at 3/3/14 12:29 PM:
------------------------------------------------------------
Thanks, Stephane!
Completely missed that RDFa was used as a part of extraction process in other
tests.
I've added related fixes.
Brief description.
*ServletTest*
Old RDFa implementation produces
{{<issue level="Warning" row="14" col="5">Error while processing node
/HTML(1)/HEAD(1)/META(9) : 'Cannot map prefix 'fb''</issue>}}
while {{<fb:app_id>}} is completely valid predicate which shouldn't be resolved
against fb: prefix.
*Any23Test*
*RoverTest*
Changed RDFXMLWriter to NTriplesWriter in some tests to improve precision (they
basically check line count).
Changed expected triples count. It was reduced in most cases, because old RDFa
parsed produced a lot of invalid triples like:
{quote}
<http://host.com/service> <http://host.com/serviceexternal>
<http://host.com/service/ambiente/> .
<http://host.com/service> <http://host.com/serviceexternal>
<http://host.com/service/salute/> .
<http://host.com/service> <http://host.com/serviceexternal>
<http://host.com/service/legalita/> .
<http://host.com/service> <http://host.com/serviceexternal>
<http://www.ansamed.info/> .
<http://host.com/service> <http://host.com/serviceexternal>
<http://host.com/service/web/notizie/regioni/lazio/provinciadiroma/> .
{quote}
Fixed markup in
{{test-resources/src/test/resources/html/rdfa/ansa_2010-02-26_12645863.html}}
to conform declared XHTML 1.0 Strict.
Fixed RDFa markup in
{{test-resources/src/test/resources/html/encoding-test.html}} otherwise it
shouldn't produce any triples.
Disabled second part of {{Any23Test.testExtractionParameters}}. Should it do
anything after RDFa parser replacement?
Also, ExtractionException thrown from BaseRDFExtractor is escalated in test
suite. It leads to some failed tests in Any23Test. What's the correct behaviour
for ANY23 parser in case it gets SAXException?
was (Author: levkhomich):
Completely missed that RDFa was used as a part of extraction process in other
tests.
I've added related fixes.
Brief description.
*ServletTest*
Old RDFa implementation produces
{{<issue level="Warning" row="14" col="5">Error while processing node
/HTML(1)/HEAD(1)/META(9) : 'Cannot map prefix 'fb''</issue>}}
while {{<fb:app_id>}} is completely valid predicate which shouldn't be resolved
against fb: prefix.
*Any23Test*
*RoverTest*
Changed RDFXMLWriter to NTriplesWriter in some tests to improve precision (they
basically check line count).
Changed expected triples count. It was reduced in most cases, because old RDFa
parsed produced a lot of invalid triples like:
{quote}
<http://host.com/service> <http://host.com/serviceexternal>
<http://host.com/service/ambiente/> .
<http://host.com/service> <http://host.com/serviceexternal>
<http://host.com/service/salute/> .
<http://host.com/service> <http://host.com/serviceexternal>
<http://host.com/service/legalita/> .
<http://host.com/service> <http://host.com/serviceexternal>
<http://www.ansamed.info/> .
<http://host.com/service> <http://host.com/serviceexternal>
<http://host.com/service/web/notizie/regioni/lazio/provinciadiroma/> .
{quote}
Fixed markup in
{{test-resources/src/test/resources/html/rdfa/ansa_2010-02-26_12645863.html}}
to conform declared XHTML 1.0 Strict.
Fixed RDFa markup in
{{test-resources/src/test/resources/html/encoding-test.html}} otherwise it
shouldn't produce any triples.
Disabled second part of {{Any23Test.testExtractionParameters}}. Should it do
anything after RDFa parser replacement?
Also, ExtractionException thrown from BaseRDFExtractor is escalated in test
suite. It leads to some failed tests in Any23Test. What's the correct behaviour
for ANY23 parser in case it gets SAXException?
> RDFa parser implementation proposal
> -----------------------------------
>
> Key: ANY23-137
> URL: https://issues.apache.org/jira/browse/ANY23-137
> Project: Apache Any23
> Issue Type: Improvement
> Components: core
> Affects Versions: 0.8.0
> Reporter: Lev Khomich
> Assignee: Peter Ansell
> Priority: Minor
> Fix For: 1.0.0
>
> Attachments: oQYfomKX.part, rdfa-extractor-proposal.patch
>
>
> As a follow up to discussion [1].
> I've implemented another RDFa extractor for Any23 (0.7.1).
> Proposed code depends on semargl project [2]. It isn't published in maven
> central, therefore I didn't change any poms.
> Still not quite sure about class name (because related ones are already
> taken),
> feel free to rename it. See attachments for patch with extractor and tests.
> [1] http://mail-archives.apache.org/mod_mbox/any23-dev/201212.mbox/browser
> [2] http://semarglproject.org
--
This message was sent by Atlassian JIRA
(v6.2#6252)