Re: XMLChar.isNameStart error?

Claude Warren Wed, 14 Feb 2018 02:53:48 -0800

My error.  I should have specifed XML 1.0 as that is the spec that I drew
the test code from:  https://www.w3.org/TR/xml/#NT-NameStartChar


So this is an error in Xerces to meet the XML 1.0 naming spec.  I have
opened a defect with Xerces (
https://issues.apache.org/jira/browse/XERCESJ-1690)  but I don't expect
much movement there.

Claude


On Wed, Feb 14, 2018 at 10:38 AM, Rob Vesse <[email protected]> wrote:

> If memory serves this is mostly historical, once upon a time RDF/XML was
> the only serialisation available and so everything had to be XML compliant.
> Obviously things have evolved over time but the implementation is
> conservative in this regards.
>
> Also I think XML 1.1 post-dates RDF/XML and various other specifications
> all of which are defined in terms of XML 1.0. For maximum compatibility it
> is better for us to be conservative because most of the ecosystem has not
> adopted XML 1.1 yet
>
> Rob
>
> On 14/02/2018, 09:04, "Claude Warren" <[email protected]> wrote:
>
>     The issue is that predicate namespaces are parsed with XMLChar.  So if
> I
>     have one that is correctly formed based on XML 1.1 spec but the XMLChar
>     code does not recognizes the first character of the local name it will
> not
>     split the URL correctly.  All code that depende upon
>     Resource.getNamespace() and Resource.getLocalName() will be
> incorrect.  It
>     seems to me this is a low level problem.
>
>     While it should be easy to fix the parsing problem, I am not certain
> what
>     effect that will have on any other code that is dependent upon the
> Xerces
>     code (where XMLChar originates).
>
>     Claude
>
>     On Tue, Feb 13, 2018 at 6:50 PM, Andy Seaborne <[email protected]>
> wrote:
>
>     > Maybe SplitIRI will help?
>     >
>     > It does Turtle splitting as well as XML.
>     >
>     >     Andy
>     >
>     >
>     > On 13/02/18 17:39, Claude Warren wrote:
>     >
>     >> It is used in org.apache.jena.rdf.model.impl.Util namespace
> splitting
>     >> code.
>     >>
>     >> On Tue, Feb 13, 2018 at 4:44 PM, Andy Seaborne <[email protected]>
> wrote:
>     >>
>     >> Where is XMLChar.isNameStart being used?
>     >>>
>     >>>
>     >>> On 13/02/18 13:10, Claude Warren wrote:
>     >>>
>     >>> Is there a reason that Jena does not support the full range of XML
> name
>     >>>> start characters?
>     >>>>
>     >>>> see https://www.w3.org/TR/xml/#NT-NameStartChar
>     >>>>
>     >>>> I wrote a quick test and found that there were a number of
> characters
>     >>>> that
>     >>>> Jena does not support.
>     >>>> Miscategorization appears to start at 0x132.  There are 936990
>     >>>> miscategorized characters.
>     >>>>
>     >>>> The issue is actually in the Xerces util class XMLChar
>     >>>>
>     >>>> Is this because of the version of Xerces we are stuck with?  Is
> there a
>     >>>> way
>     >>>> around this issue?
>     >>>>
>     >>>> Claude
>     >>>>
>     >>>> p.s. Since I can't attach a file, here is the test code I wrote.
>     >>>>
>     >>>> import static org.junit.Assert.assertTrue;
>     >>>>
>     >>>> import org.apache.xerces.util.XMLChar;
>     >>>> import org.junit.Test;
>     >>>>
>     >>>> public class NameTest {
>     >>>>       /*
>     >>>>        * NameStartChar ::= ":" | [A-Z] | "_" | [a-z] |
> [#xC0-#xD6] |
>     >>>> [#xD8-#xF6] |
>     >>>>        * [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
>     >>>> [#x200C-#x200D] |
>     >>>>        * [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] |
>     >>>> [#xF900-#xFDCF] |
>     >>>>        * [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
>     >>>>        */
>     >>>>
>     >>>>       int[][] ranges = { { ':', ':' }, { 'A', 'Z' }, { '_', '_'
> }, {
>     >>>> 0xC0,
>     >>>> 0xD6 }, { 0xD8, 0xF6 }, { 0xF8, 0x2FF },
>     >>>>               { 0x370, 0x37D }, { 0x37F, 0x1FFF }, { 0x200C,
> 0x200D }, {
>     >>>> 0x2070, 0x218F }, { 0x2C00, 0x2FEF },
>     >>>>               { 0x3001, 0xD7FF }, { 0xF900, 0xFDCF }, { 0xFDF0,
> 0xFFFD
>     >>>> }, {
>     >>>> 0x10000, 0xEFFFF } };
>     >>>>
>     >>>>       @Test
>     >>>>       public void testNameStart() {
>     >>>>
>     >>>>           for (int[] range : ranges) {
>     >>>>               for (int c = range[0]; c <= range[1]; c++) {
>     >>>>                   assertTrue( String.format( "character %s
>     >>>> 0x%s",c,Integer.toHexString( c )) , XMLChar.isNameStart( c ) );
>     >>>>               }
>     >>>>           }
>     >>>>
>     >>>>       }
>     >>>>
>     >>>>       @Test
>     >>>>       public void listNameStartErr() {
>     >>>>           int cnt = 0;
>     >>>>           for (int[] range : ranges) {
>     >>>>               for (int c = range[0]; c <= range[1]; c++) {
>     >>>>                   if (!XMLChar.isNameStart( c ))
>     >>>>                   {
>     >>>>                       System.out.print( String.format( "0x%s
>     >>>> ",Integer.toHexString( c )) );
>     >>>>                       cnt++;
>     >>>>                       if (cnt % 25 == 0)
>     >>>>                       {
>     >>>>                           System.out.println();
>     >>>>                       }
>     >>>>
>     >>>>                   }
>     >>>>
>     >>>>               }
>     >>>>           }
>     >>>>           System.out.println();
>     >>>>           System.out.println( cnt+" characters miscategorized"  );
>     >>>>       }
>     >>>>
>     >>>> }
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>
>     >>
>
>
>     --
>     I like: Like Like - The likeliest place on the web
>     <http://like-like.xenei.com>
>     LinkedIn: http://www.linkedin.com/in/claudewarren
>
>
>
>
>
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: XMLChar.isNameStart error?

Reply via email to