If memory serves this is mostly historical, once upon a time RDF/XML was the only serialisation available and so everything had to be XML compliant. Obviously things have evolved over time but the implementation is conservative in this regards.
Also I think XML 1.1 post-dates RDF/XML and various other specifications all of which are defined in terms of XML 1.0. For maximum compatibility it is better for us to be conservative because most of the ecosystem has not adopted XML 1.1 yet Rob On 14/02/2018, 09:04, "Claude Warren" <[email protected]> wrote: The issue is that predicate namespaces are parsed with XMLChar. So if I have one that is correctly formed based on XML 1.1 spec but the XMLChar code does not recognizes the first character of the local name it will not split the URL correctly. All code that depende upon Resource.getNamespace() and Resource.getLocalName() will be incorrect. It seems to me this is a low level problem. While it should be easy to fix the parsing problem, I am not certain what effect that will have on any other code that is dependent upon the Xerces code (where XMLChar originates). Claude On Tue, Feb 13, 2018 at 6:50 PM, Andy Seaborne <[email protected]> wrote: > Maybe SplitIRI will help? > > It does Turtle splitting as well as XML. > > Andy > > > On 13/02/18 17:39, Claude Warren wrote: > >> It is used in org.apache.jena.rdf.model.impl.Util namespace splitting >> code. >> >> On Tue, Feb 13, 2018 at 4:44 PM, Andy Seaborne <[email protected]> wrote: >> >> Where is XMLChar.isNameStart being used? >>> >>> >>> On 13/02/18 13:10, Claude Warren wrote: >>> >>> Is there a reason that Jena does not support the full range of XML name >>>> start characters? >>>> >>>> see https://www.w3.org/TR/xml/#NT-NameStartChar >>>> >>>> I wrote a quick test and found that there were a number of characters >>>> that >>>> Jena does not support. >>>> Miscategorization appears to start at 0x132. There are 936990 >>>> miscategorized characters. >>>> >>>> The issue is actually in the Xerces util class XMLChar >>>> >>>> Is this because of the version of Xerces we are stuck with? Is there a >>>> way >>>> around this issue? >>>> >>>> Claude >>>> >>>> p.s. Since I can't attach a file, here is the test code I wrote. >>>> >>>> import static org.junit.Assert.assertTrue; >>>> >>>> import org.apache.xerces.util.XMLChar; >>>> import org.junit.Test; >>>> >>>> public class NameTest { >>>> /* >>>> * NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | >>>> [#xD8-#xF6] | >>>> * [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | >>>> [#x200C-#x200D] | >>>> * [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | >>>> [#xF900-#xFDCF] | >>>> * [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] >>>> */ >>>> >>>> int[][] ranges = { { ':', ':' }, { 'A', 'Z' }, { '_', '_' }, { >>>> 0xC0, >>>> 0xD6 }, { 0xD8, 0xF6 }, { 0xF8, 0x2FF }, >>>> { 0x370, 0x37D }, { 0x37F, 0x1FFF }, { 0x200C, 0x200D }, { >>>> 0x2070, 0x218F }, { 0x2C00, 0x2FEF }, >>>> { 0x3001, 0xD7FF }, { 0xF900, 0xFDCF }, { 0xFDF0, 0xFFFD >>>> }, { >>>> 0x10000, 0xEFFFF } }; >>>> >>>> @Test >>>> public void testNameStart() { >>>> >>>> for (int[] range : ranges) { >>>> for (int c = range[0]; c <= range[1]; c++) { >>>> assertTrue( String.format( "character %s >>>> 0x%s",c,Integer.toHexString( c )) , XMLChar.isNameStart( c ) ); >>>> } >>>> } >>>> >>>> } >>>> >>>> @Test >>>> public void listNameStartErr() { >>>> int cnt = 0; >>>> for (int[] range : ranges) { >>>> for (int c = range[0]; c <= range[1]; c++) { >>>> if (!XMLChar.isNameStart( c )) >>>> { >>>> System.out.print( String.format( "0x%s >>>> ",Integer.toHexString( c )) ); >>>> cnt++; >>>> if (cnt % 25 == 0) >>>> { >>>> System.out.println(); >>>> } >>>> >>>> } >>>> >>>> } >>>> } >>>> System.out.println(); >>>> System.out.println( cnt+" characters miscategorized" ); >>>> } >>>> >>>> } >>>> >>>> >>>> >>>> >> >> -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
