[ 
https://issues.apache.org/jira/browse/ANY23-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241345#comment-13241345
 ] 

Ben Companjen commented on ANY23-65:
------------------------------------

Well, last night I found that no, it isn't working flawlessly. I was just 
looking at the whole thing again to trace what is going wrong.

I was about to send my XSLT file to the Sindice-dev mailing list, to bridge the 
period between now and the moment Sindice starts using Any23 0.7.0, when I 
thought of an edge case to test against. My stylesheet doesn't handle it well, 
and neither does my build from the SVN code.
The case: when a prefix attribute contains 
"rnews:http://www.iptc.org/std/rNews/1.0: foaf:http://xmlns.com/foaf/0.1/ 
dbpedia:http://dbpedia.org/resource/"; (no space between prefix name and prefix 
URI, but at least one prefix URI ending with a colon), my stylesheet wrongly 
assumes there are spaces between prefix name and prefix URI, because it tests 
whether the attribute contains ": ". The RDFa11Parser outputs warnings that it 
cannot map the foaf prefix when I test it on my 0.7.0 build.
It is an edge test case because it is invalid content for a prefix attribute, 
but since I saw that Any23 is accepting / testing against no-space prefix 
definitions and I think namespace URIs with a colon are valid (e.g. 
<http://dbpedia.org/resource/Category:>), I figured it made an interesting test 
case. I sent the file anyway, with this test result too :)

Another issue I have is that the RDFa11Parser doesn't infer the right triples 
from <link> elements with a @rel in the <head> section. I believe the @rel 
values "icon", "stylesheet", "bookmark" etc are to be treated specially. 
Sindice (Any23 0.6.1) produces URIs for these like 
<http://www.w3.org/1999/xhtml/vocab#icon> from my blog post. When I extract 
from my blog post locally, I get properties like 
<http://ben.companjen.name/2011/08/het-gezin-timmer-de-bruijn-in-amsterdam/icon>.
The parser (also) complains my HTML doesn't declare 
'xmlns="http://www.w3.org/1999/xhtml";'. I couldn't find the part of the HTML or 
RDFa specifications that says it should do so - as far as I can tell it's not 
necessary in HTML5.
Looking at the code, I see some references to RDFa 1.0 in the comments in the 
processDocument method. This method seems to be the source for the complaint. 
Maybe the problem and complaint (well, warning actually) are linked? And maybe 
the incorrect handling of the @rel values is also linked to "// TODO: introduce 
support for RDFa profiles. (http://www.w3.org/TR/rdfa-core/#s_profiles)"? BTW, 
these profiles don't exist (anymore) in RDFa Core.
I hope someone can shed more light on this, I'm too confused from reading all 
the RDFa related docs and drafts right now ;)
                
> Update to RDFa extraction stylesheet
> ------------------------------------
>
>                 Key: ANY23-65
>                 URL: https://issues.apache.org/jira/browse/ANY23-65
>             Project: Apache Any23
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Ben Companjen
>              Labels: patch, xslt
>         Attachments: rdfa-11-curies-a.html, rdfa.xslt, stylesheet.patch, 
> stylesheet3.patch, test.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> The RDFa 1.1 Core specification requests namespace prefixes in HTML5 be put 
> in a "prefix" attribute like this: "ns1: http://example.org/ ns2: 
> http://example.com/";
> My sample HTML page has this, but Sindice, which uses Any23, didn't read my 
> namespace correctly. I narrowed it down to, and changed accordingly, the XSLT 
> template "tokenize2" in the rdfa.xslt stylesheet. The template expected 
> "ns1:http://example.org/ ns2:http://example.com/"; (no spaces between prefix 
> and namespace URI) and did not normalize whitespace, like linebreaks 
> (although I'm not sure that broke the functionality).
> I use Any23 0.6.1 locally, but 
> http://svn.apache.org/viewvc/incubator/any23/trunk/core/src/main/resources/org/apache/any23/extractor/rdfa/rdfa.xslt?revision=1231556&view=markup
>  shows that the template is the same in the trunk.
> A possible problem may be that the new template will not accept the 
> non-spaced namespace definitions, like you can find in the RDFa produced by 
> Best Buy. A further improvement to my template may be accepting both 
> namespace definitions with spaces and the ones without.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to