If memory serves this is mostly historical, once upon a time RDF/XML was
the only serialisation available and so everything had to be XML compliant.
Obviously things have evolved over time but the implementation is
conservative in this regards.
Also I think XML 1.1 post-dates RDF/XML and various other specifications
all of which are defined in terms of XML 1.0. For maximum compatibility it
is better for us to be conservative because most of the ecosystem has not
adopted XML 1.1 yet
Rob
On 14/02/2018, 09:04, "Claude Warren" <cla...@xenei.com> wrote:
The issue is that predicate namespaces are parsed with XMLChar. So if
I
have one that is correctly formed based on XML 1.1 spec but the XMLChar
code does not recognizes the first character of the local name it will
not
split the URL correctly. All code that depende upon
Resource.getNamespace() and Resource.getLocalName() will be
incorrect. It
seems to me this is a low level problem.
While it should be easy to fix the parsing problem, I am not certain
what
effect that will have on any other code that is dependent upon the
Xerces
code (where XMLChar originates).
Claude
On Tue, Feb 13, 2018 at 6:50 PM, Andy Seaborne <a...@apache.org>
wrote:
> Maybe SplitIRI will help?
>
> It does Turtle splitting as well as XML.
>
> Andy
>
>
> On 13/02/18 17:39, Claude Warren wrote:
>
>> It is used in org.apache.jena.rdf.model.impl.Util namespace
splitting
>> code.
>>
>> On Tue, Feb 13, 2018 at 4:44 PM, Andy Seaborne <a...@apache.org>
wrote:
>>
>> Where is XMLChar.isNameStart being used?
>>>
>>>
>>> On 13/02/18 13:10, Claude Warren wrote:
>>>
>>> Is there a reason that Jena does not support the full range of XML
name
>>>> start characters?
>>>>
>>>> see https://www.w3.org/TR/xml/#NT-NameStartChar
>>>>
>>>> I wrote a quick test and found that there were a number of
characters
>>>> that
>>>> Jena does not support.
>>>> Miscategorization appears to start at 0x132. There are 936990
>>>> miscategorized characters.
>>>>
>>>> The issue is actually in the Xerces util class XMLChar
>>>>
>>>> Is this because of the version of Xerces we are stuck with? Is
there a
>>>> way
>>>> around this issue?
>>>>
>>>> Claude
>>>>
>>>> p.s. Since I can't attach a file, here is the test code I wrote.
>>>>
>>>> import static org.junit.Assert.assertTrue;
>>>>
>>>> import org.apache.xerces.util.XMLChar;
>>>> import org.junit.Test;
>>>>
>>>> public class NameTest {
>>>> /*
>>>> * NameStartChar ::= ":" | [A-Z] | "_" | [a-z] |
[#xC0-#xD6] |
>>>> [#xD8-#xF6] |
>>>> * [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
>>>> [#x200C-#x200D] |
>>>> * [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] |
>>>> [#xF900-#xFDCF] |
>>>> * [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
>>>> */
>>>>
>>>> int[][] ranges = { { ':', ':' }, { 'A', 'Z' }, { '_', '_'
}, {
>>>> 0xC0,
>>>> 0xD6 }, { 0xD8, 0xF6 }, { 0xF8, 0x2FF },
>>>> { 0x370, 0x37D }, { 0x37F, 0x1FFF }, { 0x200C,
0x200D }, {
>>>> 0x2070, 0x218F }, { 0x2C00, 0x2FEF },
>>>> { 0x3001, 0xD7FF }, { 0xF900, 0xFDCF }, { 0xFDF0,
0xFFFD
>>>> }, {
>>>> 0x10000, 0xEFFFF } };
>>>>
>>>> @Test
>>>> public void testNameStart() {
>>>>
>>>> for (int[] range : ranges) {
>>>> for (int c = range[0]; c <= range[1]; c++) {
>>>> assertTrue( String.format( "character %s
>>>> 0x%s",c,Integer.toHexString( c )) , XMLChar.isNameStart( c ) );
>>>> }
>>>> }
>>>>
>>>> }
>>>>
>>>> @Test
>>>> public void listNameStartErr() {
>>>> int cnt = 0;
>>>> for (int[] range : ranges) {
>>>> for (int c = range[0]; c <= range[1]; c++) {
>>>> if (!XMLChar.isNameStart( c ))
>>>> {
>>>> System.out.print( String.format( "0x%s
>>>> ",Integer.toHexString( c )) );
>>>> cnt++;
>>>> if (cnt % 25 == 0)
>>>> {
>>>> System.out.println();
>>>> }
>>>>
>>>> }
>>>>
>>>> }
>>>> }
>>>> System.out.println();
>>>> System.out.println( cnt+" characters miscategorized" );
>>>> }
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>>
>>
>>
--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren