…just a quick reply: That’s probably related to [1], an ancient issue, in
which I tended to recommend the usage of db:path. I wish we’d finally find
time and ressources to tackle this.
[1] https://github.com/BaseXdb/basex/issues/1172
On Mon, Mar 4, 2024 at 11:49 AM Eliot Kimber
wrote:
> Using BaseX 11 (but I think the code is the same in BaseX 10).
>
>
>
> I’m trying to understand how base-uri() behaves relative to how it should
> behave when the database path of a document is not a valid URI, i.e., it
> has a space in it.
>
>
>
> First I have this test:
>
>
>
> let $doc as document-node() := document { xml:base="temp/child-uri%20with%20space.xml">child }
>
> return $doc/*/child ! base-uri(.)
>
>
>
> Which produces:
>
> file:///data/basex/data/.dba/temp/child-uri%20with%20space.xml
>
>
>
> Which is the correct result: it’s the value of @xml:base and the escaped
> spaces make it a valid URI.
>
>
>
> Replacing %20 with “ “ in the @xml:base value results in this error:
>
> * Invalid URI: Illegal character in path at index 14: temp/child-uri with
> space.xml.*
>
>
>
> Also correct as the spaces have to be escaped.
>
>
>
> This verifies that base-uri() applied to nodes with explicit @xml:base
> attributes work per the spec. But this test does not involve database paths.
>
>
>
> To try to test things with database paths I then created this pair of test
> scripts:
>
> Script to put docs in a database:
>
> let $db := 'temp'
>
> let $filename as xs:string := 'with space.xml'
>
> let $doc1 as document-node() := document {No
> xml:base}
>
> let $doc2 as document-node() := document { xml:base="{'/temp/xmlbase/doc2_' || $filename}">With xml:base
> unescaped }
>
> let $doc3 as document-node() := document { xml:base="{iri-to-uri( '/temp/xmlbase/doc3_' || $filename)}">With xml:base
> escaped }
>
> return (()
>
> ,db:put($db, $doc1, 'doc1_' || $filename)
>
> ,db:put($db, $doc2, 'doc2_' || $filename)
>
> ,db:put($db, $doc3, 'doc3_' || $filename)
>
> )
>
>
>
> Script to report on them:
>
> let $db := 'temp'
>
> let $filenameBase as xs:string := 'with space.xml'
>
> return
>
> for $i in 1 to 3
>
> let $filename := 'doc' || $i || '_' || $filenameBase
>
> let $doc := db:get($db, $filename)
>
> let $child as element() := $doc/*/child
>
> let $dbPath := db:path($doc)
>
> let $baseUriDoc := base-uri($doc)
>
> let $baseUriChild :=
>
> try {
>
> base-uri($child)
>
> } catch * {
>
> $err:description
>
> }
>
> return (()
>
>,``[
>
> Doc "`{$dbPath}`":]``
>
>,$doc
>
>,``[xml:base att: "`{$child/@xml:base}`"]``
>
>,``[base URI of doc: "`{$baseUriDoc}`"]``
>
>,``[base URI of child: "`{$baseUriChild}`"]``
>
> )
>
>
>
> Which returns this result:
>
> Doc "doc1_with space.xml":
>
>
>
> No xml:base
>
>
>
> xml:base att: ""
>
> base URI of doc: "/temp/doc1_with space.xml"
>
> base URI of child: "/temp/doc1_with space.xml"
>
>
>
> Doc "doc2_with space.xml":
>
>
>
> With xml:base
> unescaped
>
>
>
> xml:base att: "/temp/xmlbase/doc2_with space.xml"
>
> base URI of doc: "/temp/doc2_with space.xml"
>
> base URI of child: "Invalid URI: Illegal character in path at index 23:
> /temp/xmlbase/doc2_with space.xml."
>
>
>
> Doc "doc3_with space.xml":
>
>
>
> With xml:base
> escaped
>
>
>
> xml:base att: "/temp/xmlbase/doc3_with%20space.xml"
>
> base URI of doc: "/temp/doc3_with space.xml"
>
> base URI of child: "Invalid URI: Illegal character in path at index 15:
> /temp/doc3_with space.xml."
>
>
>
> Note the result for doc3: It’s reporting the base URI of the document
> (/temp/doc3_with space.xml), not the base URI of the child
> (/temp/xmlbase/doc_with%20space.xml). Why? I think the answer is that under
> the covers it’s doing resolve-uri(), which also checks the validity of both
> the base and relative parts.
>
>
>
> One observation is that base-uri() is treating the db-provided base URI
> differently from an xml:base-provided base URI, but only when there is no
> @xml:base attribute.
>
>
>
> In doc 1, the database path has a space but base-uri() does not fail when
> returning it even though it’s not a valid URI. Why not?
>
>
>
> In doc 2, the xml:base-supplied base URI is correctly reported as invalid,
> but the database-supplied base URI of the root is not reported as invalid.
>
>
>
> My expectation would be that the behavior is consistent: Either all URIs
> must be valid, including those coming from database paths or all are
> automatically escaped (as though iri-to-uri() had been applied).
>
>
>
> Finally, why do I get the result for doc 3, where it’s reporting the
> database path as the base URI of the child rather than the
> @xml:base-defined base URI (which is correctly escaped).
>
>
>
> In my code, which depends on the use of @xml:base to do DITA link
> resolution for “resolved” DITA maps, I’ve adjusted my code to escape URIs
> in @xml:base values and as far as I can tell everything works as it should.
> But I’m still concerned about the