[
https://issues.apache.org/jira/browse/ANY23-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755838#comment-13755838
]
Peter Ansell commented on ANY23-137:
------------------------------------
Lev released 0.6 over the weekend and I updated the RDFa parser factories in
Any23 to use it (via RDFFormat.RDFA).
There are some unit tests that are failing, so I haven't committed it to the
master branch yet. Some are failing due to well-formedness exceptions, which
may be that Semargl is more strict than our previous tag soup parser. One of
them that I am interested in seems to be failing due to an error extracting
CURIEs and mapping them to Sesame:
RDFa11ExtractorTest>AbstractRDFaExtractorTestCase.testRDFa11CURIEs:77->AbstractExtractorTestCase.assertContains:244
Assertion failed! Extracted triples:
<http://dbpedia.org/resource/Albert_Einstein> <http://dbpedia.org/name>
"Albert Einstein" ;
<http://dbpedia.org/knows>
<http://dbpedia.org/resource/Franklin_Roosevlet> .
<db:table/Departments> <db:description> "Tables listing departments" ;
<http://xmlns.com/foaf/0.1/author> <db:people/Davide_Palmisano> ;
<http://purl.org/dc/terms/name> "Departments" .
Cannot find triple (http://database.org/table/Departments
http://database.org/description "Tables listing departments")
That error message seems to indicate that the internal Sesame repository did
not receive the namespace declaration to map "db:" to "http://database.org/".
That will need to be tested at the Semargl end of things, however, it may also
be an error on our end if we are using a custom RDFHandler that doesn't react
properly to RDFHandler.handleNamespace.
The branch, named ANY23-137, with the parser factory conversion is available in
the Apache Git repository and in my GitHub repository if you prefer to fetch it
from there.
> RDFa parser implementation proposal
> -----------------------------------
>
> Key: ANY23-137
> URL: https://issues.apache.org/jira/browse/ANY23-137
> Project: Apache Any23
> Issue Type: Improvement
> Components: core
> Affects Versions: 0.8.0
> Reporter: Lev Khomich
> Assignee: Peter Ansell
> Priority: Minor
> Fix For: 0.9.0
>
> Attachments: oQYfomKX.part, rdfa-extractor-proposal.patch
>
>
> As a follow up to discussion [1].
> I've implemented another RDFa extractor for Any23 (0.7.1).
> Proposed code depends on semargl project [2]. It isn't published in maven
> central, therefore I didn't change any poms.
> Still not quite sure about class name (because related ones are already
> taken),
> feel free to rename it. See attachments for patch with extractor and tests.
> [1] http://mail-archives.apache.org/mod_mbox/any23-dev/201212.mbox/browser
> [2] http://semarglproject.org
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira