[
https://issues.apache.org/jira/browse/ANY23-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241345#comment-13241345
]
Ben Companjen commented on ANY23-65:
------------------------------------
Well, last night I found that no, it isn't working flawlessly. I was just
looking at the whole thing again to trace what is going wrong.
I was about to send my XSLT file to the Sindice-dev mailing list, to bridge the
period between now and the moment Sindice starts using Any23 0.7.0, when I
thought of an edge case to test against. My stylesheet doesn't handle it well,
and neither does my build from the SVN code.
The case: when a prefix attribute contains
"rnews:http://www.iptc.org/std/rNews/1.0: foaf:http://xmlns.com/foaf/0.1/
dbpedia:http://dbpedia.org/resource/" (no space between prefix name and prefix
URI, but at least one prefix URI ending with a colon), my stylesheet wrongly
assumes there are spaces between prefix name and prefix URI, because it tests
whether the attribute contains ": ". The RDFa11Parser outputs warnings that it
cannot map the foaf prefix when I test it on my 0.7.0 build.
It is an edge test case because it is invalid content for a prefix attribute,
but since I saw that Any23 is accepting / testing against no-space prefix
definitions and I think namespace URIs with a colon are valid (e.g.
<http://dbpedia.org/resource/Category:>), I figured it made an interesting test
case. I sent the file anyway, with this test result too :)
Another issue I have is that the RDFa11Parser doesn't infer the right triples
from <link> elements with a @rel in the <head> section. I believe the @rel
values "icon", "stylesheet", "bookmark" etc are to be treated specially.
Sindice (Any23 0.6.1) produces URIs for these like
<http://www.w3.org/1999/xhtml/vocab#icon> from my blog post. When I extract
from my blog post locally, I get properties like
<http://ben.companjen.name/2011/08/het-gezin-timmer-de-bruijn-in-amsterdam/icon>.
The parser (also) complains my HTML doesn't declare
'xmlns="http://www.w3.org/1999/xhtml"'. I couldn't find the part of the HTML or
RDFa specifications that says it should do so - as far as I can tell it's not
necessary in HTML5.
Looking at the code, I see some references to RDFa 1.0 in the comments in the
processDocument method. This method seems to be the source for the complaint.
Maybe the problem and complaint (well, warning actually) are linked? And maybe
the incorrect handling of the @rel values is also linked to "// TODO: introduce
support for RDFa profiles. (http://www.w3.org/TR/rdfa-core/#s_profiles)"? BTW,
these profiles don't exist (anymore) in RDFa Core.
I hope someone can shed more light on this, I'm too confused from reading all
the RDFa related docs and drafts right now ;)
> Update to RDFa extraction stylesheet
> ------------------------------------
>
> Key: ANY23-65
> URL: https://issues.apache.org/jira/browse/ANY23-65
> Project: Apache Any23
> Issue Type: Improvement
> Affects Versions: 0.7.0
> Reporter: Ben Companjen
> Labels: patch, xslt
> Attachments: rdfa-11-curies-a.html, rdfa.xslt, stylesheet.patch,
> stylesheet3.patch, test.patch
>
> Original Estimate: 3h
> Remaining Estimate: 3h
>
> The RDFa 1.1 Core specification requests namespace prefixes in HTML5 be put
> in a "prefix" attribute like this: "ns1: http://example.org/ ns2:
> http://example.com/"
> My sample HTML page has this, but Sindice, which uses Any23, didn't read my
> namespace correctly. I narrowed it down to, and changed accordingly, the XSLT
> template "tokenize2" in the rdfa.xslt stylesheet. The template expected
> "ns1:http://example.org/ ns2:http://example.com/" (no spaces between prefix
> and namespace URI) and did not normalize whitespace, like linebreaks
> (although I'm not sure that broke the functionality).
> I use Any23 0.6.1 locally, but
> http://svn.apache.org/viewvc/incubator/any23/trunk/core/src/main/resources/org/apache/any23/extractor/rdfa/rdfa.xslt?revision=1231556&view=markup
> shows that the template is the same in the trunk.
> A possible problem may be that the new template will not accept the
> non-spaced namespace definitions, like you can find in the RDFa produced by
> Best Buy. A further improvement to my template may be accepting both
> namespace definitions with spaces and the ones without.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira