Re: XMLChar.isNameStart error?

Rob Vesse Wed, 14 Feb 2018 02:39:07 -0800

If memory serves this is mostly historical, once upon a time RDF/XML was the 
only serialisation available and so everything had to be XML compliant. 
Obviously things have evolved over time but the implementation is conservative 
in this regards.


Also I think XML 1.1 post-dates RDF/XML and various other specifications all of 
which are defined in terms of XML 1.0. For maximum compatibility it is better 
for us to be conservative because most of the ecosystem has not adopted XML 1.1 
yet

Rob

On 14/02/2018, 09:04, "Claude Warren" <[email protected]> wrote:

    The issue is that predicate namespaces are parsed with XMLChar.  So if I
    have one that is correctly formed based on XML 1.1 spec but the XMLChar
    code does not recognizes the first character of the local name it will not
    split the URL correctly.  All code that depende upon
    Resource.getNamespace() and Resource.getLocalName() will be incorrect.  It
    seems to me this is a low level problem.
    
    While it should be easy to fix the parsing problem, I am not certain what
    effect that will have on any other code that is dependent upon the Xerces
    code (where XMLChar originates).
    
    Claude
    
    On Tue, Feb 13, 2018 at 6:50 PM, Andy Seaborne <[email protected]> wrote:
    
    > Maybe SplitIRI will help?
    >
    > It does Turtle splitting as well as XML.
    >
    >     Andy
    >
    >
    > On 13/02/18 17:39, Claude Warren wrote:
    >
    >> It is used in org.apache.jena.rdf.model.impl.Util namespace splitting
    >> code.
    >>
    >> On Tue, Feb 13, 2018 at 4:44 PM, Andy Seaborne <[email protected]> wrote:
    >>
    >> Where is XMLChar.isNameStart being used?
    >>>
    >>>
    >>> On 13/02/18 13:10, Claude Warren wrote:
    >>>
    >>> Is there a reason that Jena does not support the full range of XML name
    >>>> start characters?
    >>>>
    >>>> see https://www.w3.org/TR/xml/#NT-NameStartChar
    >>>>
    >>>> I wrote a quick test and found that there were a number of characters
    >>>> that
    >>>> Jena does not support.
    >>>> Miscategorization appears to start at 0x132.  There are 936990
    >>>> miscategorized characters.
    >>>>
    >>>> The issue is actually in the Xerces util class XMLChar
    >>>>
    >>>> Is this because of the version of Xerces we are stuck with?  Is there a
    >>>> way
    >>>> around this issue?
    >>>>
    >>>> Claude
    >>>>
    >>>> p.s. Since I can't attach a file, here is the test code I wrote.
    >>>>
    >>>> import static org.junit.Assert.assertTrue;
    >>>>
    >>>> import org.apache.xerces.util.XMLChar;
    >>>> import org.junit.Test;
    >>>>
    >>>> public class NameTest {
    >>>>       /*
    >>>>        * NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
    >>>> [#xD8-#xF6] |
    >>>>        * [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
    >>>> [#x200C-#x200D] |
    >>>>        * [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] |
    >>>> [#xF900-#xFDCF] |
    >>>>        * [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
    >>>>        */
    >>>>
    >>>>       int[][] ranges = { { ':', ':' }, { 'A', 'Z' }, { '_', '_' }, {
    >>>> 0xC0,
    >>>> 0xD6 }, { 0xD8, 0xF6 }, { 0xF8, 0x2FF },
    >>>>               { 0x370, 0x37D }, { 0x37F, 0x1FFF }, { 0x200C, 0x200D }, 
{
    >>>> 0x2070, 0x218F }, { 0x2C00, 0x2FEF },
    >>>>               { 0x3001, 0xD7FF }, { 0xF900, 0xFDCF }, { 0xFDF0, 0xFFFD
    >>>> }, {
    >>>> 0x10000, 0xEFFFF } };
    >>>>
    >>>>       @Test
    >>>>       public void testNameStart() {
    >>>>
    >>>>           for (int[] range : ranges) {
    >>>>               for (int c = range[0]; c <= range[1]; c++) {
    >>>>                   assertTrue( String.format( "character %s
    >>>> 0x%s",c,Integer.toHexString( c )) , XMLChar.isNameStart( c ) );
    >>>>               }
    >>>>           }
    >>>>
    >>>>       }
    >>>>
    >>>>       @Test
    >>>>       public void listNameStartErr() {
    >>>>           int cnt = 0;
    >>>>           for (int[] range : ranges) {
    >>>>               for (int c = range[0]; c <= range[1]; c++) {
    >>>>                   if (!XMLChar.isNameStart( c ))
    >>>>                   {
    >>>>                       System.out.print( String.format( "0x%s
    >>>> ",Integer.toHexString( c )) );
    >>>>                       cnt++;
    >>>>                       if (cnt % 25 == 0)
    >>>>                       {
    >>>>                           System.out.println();
    >>>>                       }
    >>>>
    >>>>                   }
    >>>>
    >>>>               }
    >>>>           }
    >>>>           System.out.println();
    >>>>           System.out.println( cnt+" characters miscategorized"  );
    >>>>       }
    >>>>
    >>>> }
    >>>>
    >>>>
    >>>>
    >>>>
    >>
    >>
    
    
    -- 
    I like: Like Like - The likeliest place on the web
    <http://like-like.xenei.com>
    LinkedIn: http://www.linkedin.com/in/claudewarren

Re: XMLChar.isNameStart error?

Reply via email to