Oh right—I submitted that issue 9 years ago !

Any solution will be challenging due to the age of the current behavior and the 
twisty nature of the code.

My use of @xml:base may be somewhat singular as it’s specific to the way that 
DITA works and I don’t know that there is any other XML document type in common 
usuage that has a similar hyperlink structure that depends on using xml:base.

Cheers,

E.

_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | 
Twitter<https://twitter.com/servicenow> | 
YouTube<https://www.youtube.com/user/servicenowinc> | 
Facebook<https://www.facebook.com/servicenow>

From: Christian Grün <christian.gr...@gmail.com>
Date: Monday, March 4, 2024 at 5:16 AM
To: Eliot Kimber <eliot.kim...@servicenow.com>
Cc: basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] Inconsistency in base-uri()
[External Email]

________________________________
…just a quick reply: That’s probably related to [1], an ancient issue, in which 
I tended to recommend the usage of db:path. I wish we’d finally find time and 
ressources to tackle this.[1] https://github.com/BaseXdb/basex/issues/1172On 
Mon, Mar 4, 2024 at 11:49 AM Eliot Kimber < ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

i
This message needs your attention
• Someone new is on this email.
Provided by ServiceNow DT (Employee Portal KB0077950) - This banner is visible 
only to ServiceNow employees.<https://mimecast.com>
…just a quick reply: That’s probably related to [1], an ancient issue, in which 
I tended to recommend the usage of db:path. I wish we’d finally find time and 
ressources to tackle this.


[1] https://github.com/BaseXdb/basex/issues/1172

On Mon, Mar 4, 2024 at 11:49 AM Eliot Kimber 
<eliot.kim...@servicenow.com<mailto:eliot.kim...@servicenow.com>> wrote:
Using BaseX 11 (but I think the code is the same in BaseX 10).

I’m trying to understand how base-uri() behaves relative to how it should 
behave when the database path of a document is not a valid URI, i.e., it has a 
space in it.

First I have this test:

let $doc as document-node() := document { <root><child 
xml:base="temp/child-uri%20with%20space.xml">child</child></root> }
return $doc/*/child ! base-uri(.)

Which produces:

file:///data/basex/data/.dba/temp/child-uri%20with%20space.xml

Which is the correct result: it’s the value of @xml:base and the escaped spaces 
make it a valid URI.

Replacing %20 with “ “ in the @xml:base value results in this error:

 Invalid URI: Illegal character in path at index 14: temp/child-uri with 
space.xml.

Also correct as the spaces have to be escaped.

This verifies that base-uri() applied to nodes with explicit @xml:base 
attributes work per the spec. But this test does not involve database paths.

To try to test things with database paths I then created this pair of test 
scripts:

Script to put docs in a database:

let $db := 'temp'
let $filename as xs:string := 'with space.xml'
let $doc1 as document-node() := document {<root><child>No 
xml:base</child></root>}
let $doc2 as document-node() := document { <root><child 
xml:base="{'/temp/xmlbase/doc2_' || $filename}">With xml:base 
unescaped</child></root> }
let $doc3 as document-node() := document { <root><child xml:base="{iri-to-uri( 
'/temp/xmlbase/doc3_' || $filename)}">With xml:base escaped</child></root> }
return (()
,db:put($db, $doc1, 'doc1_' || $filename)
,db:put($db, $doc2, 'doc2_' || $filename)
,db:put($db, $doc3, 'doc3_' || $filename)
)

Script to report on them:

let $db := 'temp'
let $filenameBase as xs:string := 'with space.xml'
return
for $i in 1 to 3
  let $filename := 'doc' || $i  || '_' || $filenameBase
  let $doc := db:get($db, $filename)
  let $child as element() := $doc/*/child
  let $dbPath := db:path($doc)
  let $baseUriDoc := base-uri($doc)
  let $baseUriChild :=
      try {
        base-uri($child)
      } catch * {
        $err:description
      }
  return (()
   ,``[
Doc "`{$dbPath}`":]``
   ,$doc
   ,``[xml:base att:  "`{$child/@xml:base}`"]``
   ,``[base URI of doc:  "`{$baseUriDoc}`"]``
   ,``[base URI of child: "`{$baseUriChild}`"]``
  )

Which returns this result:
Doc "doc1_with space.xml":
<root>
  <child>No xml:base</child>
</root>
xml:base att:  ""
base URI of doc:  "/temp/doc1_with space.xml"
base URI of child: "/temp/doc1_with space.xml"

Doc "doc2_with space.xml":
<root>
  <child xml:base="/temp/xmlbase/doc2_with space.xml">With xml:base 
unescaped</child>
</root>
xml:base att:  "/temp/xmlbase/doc2_with space.xml"
base URI of doc:  "/temp/doc2_with space.xml"
base URI of child: "Invalid URI: Illegal character in path at index 23: 
/temp/xmlbase/doc2_with space.xml."

Doc "doc3_with space.xml":
<root>
  <child xml:base="/temp/xmlbase/doc3_with%20space.xml">With xml:base 
escaped</child>
</root>
xml:base att:  "/temp/xmlbase/doc3_with%20space.xml"
base URI of doc:  "/temp/doc3_with space.xml"
base URI of child: "Invalid URI: Illegal character in path at index 15: 
/temp/doc3_with space.xml."

Note the result for doc3: It’s reporting the base URI of the document 
(/temp/doc3_with space.xml), not the base URI of the child 
(/temp/xmlbase/doc_with%20space.xml). Why? I think the answer is that under the 
covers it’s doing resolve-uri(), which also checks the validity of both the 
base and relative parts.

One observation is that base-uri() is treating the db-provided base URI 
differently from an xml:base-provided base URI, but only when there is no 
@xml:base attribute.

In doc 1, the database path has a space but base-uri() does not fail when 
returning it even though it’s not a valid URI. Why not?

In doc 2, the xml:base-supplied base URI is correctly reported as invalid, but 
the database-supplied base URI of the root is not reported as invalid.

My expectation would be that the behavior is consistent: Either all URIs must 
be valid, including those coming from database paths or all are automatically 
escaped (as though iri-to-uri() had been applied).

Finally, why do I get the result for doc 3, where it’s reporting the database 
path as the base URI of the child rather than the @xml:base-defined base URI 
(which is correctly escaped).

In my code, which depends on the use of @xml:base to do DITA link resolution 
for “resolved” DITA maps, I’ve adjusted my code to escape URIs in @xml:base 
values and as far as I can tell everything works as it should. But I’m still 
concerned about the inconsistency in the behavior of base-uri().

I tried to trace through the code that handles base-uri() but it’s pretty 
twisty and does different things for files and nodes.

It would obviously be very disruptive to have base-uri() start failing on 
database paths with spaces—I think the current behavior dates back to the very 
start of BaseX, but it’s still an inconsistency that can lead to trouble with 
the unawares.

For example, consider this code:

let $topicref := db:get('maps', 'map with space.ditamap')/*/topicref[@href][1]
let $target as element()? := local:resolve-href($topicref)
let $baseUri as xs:string := base-uri($target) ! string(.)
let $newElem as element := <submap xml:base="{$baseURI}"/>
return base-uri($newElem)

The value of $baseUri will be “map with space.ditamap”, not 
“map%20with%20space.ditamap”, making the value of @xml:base on $newElem: 
xml:base=”map with space.ditamap”, meaning that base-uri($newElem) will throw 
an invalid URI exception.

My expectation would be either that base-uri($target) also throws an exception 
or, more usefully, that it returns the iri-to-uri() result, ensuring that the 
values will always be treated as valid URIs, consistent with how the database 
paths are treated.

Cheers,

E.

Reply via email to