Re: [basex-talk] Inconsistency in base-uri()

2024-03-04 Thread Eliot Kimber
Oh right—I submitted that issue 9 years ago !

Any solution will be challenging due to the age of the current behavior and the 
twisty nature of the code.

My use of @xml:base may be somewhat singular as it’s specific to the way that 
DITA works and I don’t know that there is any other XML document type in common 
usuage that has a similar hyperlink structure that depends on using xml:base.

Cheers,

E.

_
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | 
Twitter<https://twitter.com/servicenow> | 
YouTube<https://www.youtube.com/user/servicenowinc> | 
Facebook<https://www.facebook.com/servicenow>

From: Christian Grün 
Date: Monday, March 4, 2024 at 5:16 AM
To: Eliot Kimber 
Cc: basex-talk@mailman.uni-konstanz.de 
Subject: Re: [basex-talk] Inconsistency in base-uri()
[External Email]


…just a quick reply: That’s probably related to [1], an ancient issue, in which 
I tended to recommend the usage of db:path. I wish we’d finally find time and 
ressources to tackle this.[1] https://github.com/BaseXdb/basex/issues/1172On 
Mon, Mar 4, 2024 at 11:49 AM Eliot Kimber < ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

i
This message needs your attention
• Someone new is on this email.
Provided by ServiceNow DT (Employee Portal KB0077950) - This banner is visible 
only to ServiceNow employees.<https://mimecast.com>
…just a quick reply: That’s probably related to [1], an ancient issue, in which 
I tended to recommend the usage of db:path. I wish we’d finally find time and 
ressources to tackle this.


[1] https://github.com/BaseXdb/basex/issues/1172

On Mon, Mar 4, 2024 at 11:49 AM Eliot Kimber 
mailto:eliot.kim...@servicenow.com>> wrote:
Using BaseX 11 (but I think the code is the same in BaseX 10).

I’m trying to understand how base-uri() behaves relative to how it should 
behave when the database path of a document is not a valid URI, i.e., it has a 
space in it.

First I have this test:

let $doc as document-node() := document { child }
return $doc/*/child ! base-uri(.)

Which produces:

file:///data/basex/data/.dba/temp/child-uri%20with%20space.xml

Which is the correct result: it’s the value of @xml:base and the escaped spaces 
make it a valid URI.

Replacing %20 with “ “ in the @xml:base value results in this error:

 Invalid URI: Illegal character in path at index 14: temp/child-uri with 
space.xml.

Also correct as the spaces have to be escaped.

This verifies that base-uri() applied to nodes with explicit @xml:base 
attributes work per the spec. But this test does not involve database paths.

To try to test things with database paths I then created this pair of test 
scripts:

Script to put docs in a database:

let $db := 'temp'
let $filename as xs:string := 'with space.xml'
let $doc1 as document-node() := document {No 
xml:b

Re: [basex-talk] Inconsistency in base-uri()

2024-03-04 Thread Christian Grün
…just a quick reply: That’s probably related to [1], an ancient issue, in
which I tended to recommend the usage of db:path. I wish we’d finally find
time and ressources to tackle this.


[1] https://github.com/BaseXdb/basex/issues/1172

On Mon, Mar 4, 2024 at 11:49 AM Eliot Kimber 
wrote:

> Using BaseX 11 (but I think the code is the same in BaseX 10).
>
>
>
> I’m trying to understand how base-uri() behaves relative to how it should
> behave when the database path of a document is not a valid URI, i.e., it
> has a space in it.
>
>
>
> First I have this test:
>
>
>
> let $doc as document-node() := document {  xml:base="temp/child-uri%20with%20space.xml">child }
>
> return $doc/*/child ! base-uri(.)
>
>
>
> Which produces:
>
> file:///data/basex/data/.dba/temp/child-uri%20with%20space.xml
>
>
>
> Which is the correct result: it’s the value of @xml:base and the escaped
> spaces make it a valid URI.
>
>
>
> Replacing %20 with “ “ in the @xml:base value results in this error:
>
> * Invalid URI: Illegal character in path at index 14: temp/child-uri with
> space.xml.*
>
>
>
> Also correct as the spaces have to be escaped.
>
>
>
> This verifies that base-uri() applied to nodes with explicit @xml:base
> attributes work per the spec. But this test does not involve database paths.
>
>
>
> To try to test things with database paths I then created this pair of test
> scripts:
>
> Script to put docs in a database:
>
> let $db := 'temp'
>
> let $filename as xs:string := 'with space.xml'
>
> let $doc1 as document-node() := document {No
> xml:base}
>
> let $doc2 as document-node() := document {  xml:base="{'/temp/xmlbase/doc2_' || $filename}">With xml:base
> unescaped }
>
> let $doc3 as document-node() := document {  xml:base="{iri-to-uri( '/temp/xmlbase/doc3_' || $filename)}">With xml:base
> escaped }
>
> return (()
>
> ,db:put($db, $doc1, 'doc1_' || $filename)
>
> ,db:put($db, $doc2, 'doc2_' || $filename)
>
> ,db:put($db, $doc3, 'doc3_' || $filename)
>
> )
>
>
>
> Script to report on them:
>
> let $db := 'temp'
>
> let $filenameBase as xs:string := 'with space.xml'
>
> return
>
> for $i in 1 to 3
>
>   let $filename := 'doc' || $i  || '_' || $filenameBase
>
>   let $doc := db:get($db, $filename)
>
>   let $child as element() := $doc/*/child
>
>   let $dbPath := db:path($doc)
>
>   let $baseUriDoc := base-uri($doc)
>
>   let $baseUriChild :=
>
>   try {
>
> base-uri($child)
>
>   } catch * {
>
> $err:description
>
>   }
>
>   return (()
>
>,``[
>
> Doc "`{$dbPath}`":]``
>
>,$doc
>
>,``[xml:base att:  "`{$child/@xml:base}`"]``
>
>,``[base URI of doc:  "`{$baseUriDoc}`"]``
>
>,``[base URI of child: "`{$baseUriChild}`"]``
>
>   )
>
>
>
> Which returns this result:
>
> Doc "doc1_with space.xml":
>
> 
>
>   No xml:base
>
> 
>
> xml:base att:  ""
>
> base URI of doc:  "/temp/doc1_with space.xml"
>
> base URI of child: "/temp/doc1_with space.xml"
>
>
>
> Doc "doc2_with space.xml":
>
> 
>
>   With xml:base
> unescaped
>
> 
>
> xml:base att:  "/temp/xmlbase/doc2_with space.xml"
>
> base URI of doc:  "/temp/doc2_with space.xml"
>
> base URI of child: "Invalid URI: Illegal character in path at index 23:
> /temp/xmlbase/doc2_with space.xml."
>
>
>
> Doc "doc3_with space.xml":
>
> 
>
>   With xml:base
> escaped
>
> 
>
> xml:base att:  "/temp/xmlbase/doc3_with%20space.xml"
>
> base URI of doc:  "/temp/doc3_with space.xml"
>
> base URI of child: "Invalid URI: Illegal character in path at index 15:
> /temp/doc3_with space.xml."
>
>
>
> Note the result for doc3: It’s reporting the base URI of the document
> (/temp/doc3_with space.xml), not the base URI of the child
> (/temp/xmlbase/doc_with%20space.xml). Why? I think the answer is that under
> the covers it’s doing resolve-uri(), which also checks the validity of both
> the base and relative parts.
>
>
>
> One observation is that base-uri() is treating the db-provided base URI
> differently from an xml:base-provided base URI, but only when there is no
> @xml:base attribute.
>
>
>
> In doc 1, the database path has a space but base-uri() does not fail when
> returning it even though it’s not a valid URI. Why not?
>
>
>
> In doc 2, the xml:base-supplied base URI is correctly reported as invalid,
> but the database-supplied base URI of the root is not reported as invalid.
>
>
>
> My expectation would be that the behavior is consistent: Either all URIs
> must be valid, including those coming from database paths or all are
> automatically escaped (as though iri-to-uri() had been applied).
>
>
>
> Finally, why do I get the result for doc 3, where it’s reporting the
> database path as the base URI of the child rather than the
> @xml:base-defined base URI (which is correctly escaped).
>
>
>
> In my code, which depends on the use of @xml:base to do DITA link
> resolution for “resolved” DITA maps, I’ve adjusted my code to escape URIs
> in @xml:base values and as far as I can tell everything works as it should.
> But I’m still concerned about the