Re: XMLChar.isNameStart error?

Andy Seaborne Wed, 14 Feb 2018 03:41:21 -0800

This is about "editions" of XML 1.0.

On 14/02/18 10:52, Claude Warren wrote:

My error.  I should have specifed XML 1.0 as that is the spec that I drew
the test code from:  https://www.w3.org/TR/xml/#NT-NameStartChar

Is the XMLChar in the JDK correct? I don't know what edition thebuilt-in Java XML parser supports.thing).


So this is an error in Xerces to meet the XML 1.0 naming spec.  I have
opened a defect with Xerces (
https://issues.apache.org/jira/browse/XERCESJ-1690)  but I don't expect
much movement there.


Apache Xerces claims suport for "XML 1.0 (4th Edition)", not edition 5.

I looked at XML 1.0 edition 4 and it looks different
"| [#x0100-#x0131] | [#x0134-#x013E] |"
no x132.

Xerces was going to release 2.12 last year but I think that ran out ofenergy. No sure what edition is targeted.


----

Jena is not so heavily tied Xerces. Theer are only a couple of filesthat import org.apache.xerces datatype code.

We could extract the datatype source and adopt, then use the Javabuiltin parser or any other because we then don't depend/ship Xerces.


Xerces gets to tbe the XML parser by ServiceLoading.

----

>> it will not split the URL correctly.

A "feature" of RDF/XML

Actually, there isn't a "correct split" though we all expect split at"/" or "#".


    Andy


Claude


On Wed, Feb 14, 2018 at 10:38 AM, Rob Vesse <[email protected]> wrote:

If memory serves this is mostly historical, once upon a time RDF/XML was
the only serialisation available and so everything had to be XML compliant.
Obviously things have evolved over time but the implementation is
conservative in this regards.

Also I think XML 1.1 post-dates RDF/XML and various other specifications
all of which are defined in terms of XML 1.0. For maximum compatibility it
is better for us to be conservative because most of the ecosystem has not
adopted XML 1.1 yet

Rob

On 14/02/2018, 09:04, "Claude Warren" <[email protected]> wrote:

     The issue is that predicate namespaces are parsed with XMLChar.  So if
I
     have one that is correctly formed based on XML 1.1 spec but the XMLChar
     code does not recognizes the first character of the local name it will
not
     split the URL correctly.  All code that depende upon
     Resource.getNamespace() and Resource.getLocalName() will be
incorrect.  It
     seems to me this is a low level problem.

     While it should be easy to fix the parsing problem, I am not certain
what
     effect that will have on any other code that is dependent upon the
Xerces
     code (where XMLChar originates).

     Claude

     On Tue, Feb 13, 2018 at 6:50 PM, Andy Seaborne <[email protected]>
wrote:

     > Maybe SplitIRI will help?
     >
     > It does Turtle splitting as well as XML.
     >
     >     Andy
     >
     >
     > On 13/02/18 17:39, Claude Warren wrote:
     >
     >> It is used in org.apache.jena.rdf.model.impl.Util namespace
splitting
     >> code.
     >>
     >> On Tue, Feb 13, 2018 at 4:44 PM, Andy Seaborne <[email protected]>
wrote:
     >>
     >> Where is XMLChar.isNameStart being used?
     >>>
     >>>
     >>> On 13/02/18 13:10, Claude Warren wrote:
     >>>
     >>> Is there a reason that Jena does not support the full range of XML
name
     >>>> start characters?
     >>>>
     >>>> see https://www.w3.org/TR/xml/#NT-NameStartChar
     >>>>
     >>>> I wrote a quick test and found that there were a number of
characters
     >>>> that
     >>>> Jena does not support.
     >>>> Miscategorization appears to start at 0x132.  There are 936990
     >>>> miscategorized characters.
     >>>>
     >>>> The issue is actually in the Xerces util class XMLChar
     >>>>
     >>>> Is this because of the version of Xerces we are stuck with?  Is
there a
     >>>> way
     >>>> around this issue?
     >>>>
     >>>> Claude
     >>>>
     >>>> p.s. Since I can't attach a file, here is the test code I wrote.
     >>>>
     >>>> import static org.junit.Assert.assertTrue;
     >>>>
     >>>> import org.apache.xerces.util.XMLChar;
     >>>> import org.junit.Test;
     >>>>
     >>>> public class NameTest {
     >>>>       /*
     >>>>        * NameStartChar ::= ":" | [A-Z] | "_" | [a-z] |
[#xC0-#xD6] |
     >>>> [#xD8-#xF6] |
     >>>>        * [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
     >>>> [#x200C-#x200D] |
     >>>>        * [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] |
     >>>> [#xF900-#xFDCF] |
     >>>>        * [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
     >>>>        */
     >>>>
     >>>>       int[][] ranges = { { ':', ':' }, { 'A', 'Z' }, { '_', '_'
}, {
     >>>> 0xC0,
     >>>> 0xD6 }, { 0xD8, 0xF6 }, { 0xF8, 0x2FF },
     >>>>               { 0x370, 0x37D }, { 0x37F, 0x1FFF }, { 0x200C,
0x200D }, {
     >>>> 0x2070, 0x218F }, { 0x2C00, 0x2FEF },
     >>>>               { 0x3001, 0xD7FF }, { 0xF900, 0xFDCF }, { 0xFDF0,
0xFFFD
     >>>> }, {
     >>>> 0x10000, 0xEFFFF } };
     >>>>
     >>>>       @Test
     >>>>       public void testNameStart() {
     >>>>
     >>>>           for (int[] range : ranges) {
     >>>>               for (int c = range[0]; c <= range[1]; c++) {
     >>>>                   assertTrue( String.format( "character %s
     >>>> 0x%s",c,Integer.toHexString( c )) , XMLChar.isNameStart( c ) );
     >>>>               }
     >>>>           }
     >>>>
     >>>>       }
     >>>>
     >>>>       @Test
     >>>>       public void listNameStartErr() {
     >>>>           int cnt = 0;
     >>>>           for (int[] range : ranges) {
     >>>>               for (int c = range[0]; c <= range[1]; c++) {
     >>>>                   if (!XMLChar.isNameStart( c ))
     >>>>                   {
     >>>>                       System.out.print( String.format( "0x%s
     >>>> ",Integer.toHexString( c )) );
     >>>>                       cnt++;
     >>>>                       if (cnt % 25 == 0)
     >>>>                       {
     >>>>                           System.out.println();
     >>>>                       }
     >>>>
     >>>>                   }
     >>>>
     >>>>               }
     >>>>           }
     >>>>           System.out.println();
     >>>>           System.out.println( cnt+" characters miscategorized"  );
     >>>>       }
     >>>>
     >>>> }
     >>>>
     >>>>
     >>>>
     >>>>
     >>
     >>


     --
     I like: Like Like - The likeliest place on the web
     <http://like-like.xenei.com>
     LinkedIn: http://www.linkedin.com/in/claudewarren

Re: XMLChar.isNameStart error?

Reply via email to