While moving the URI parameter to the query string seems like an acceptable workaround, I, too, suggest that if *reserved* URI characters such as '/' appear percent-encoded, they should not be converted to their decoded character prior to analyzing the URI, in line with Sect. 2.2 of RFC 3986 [1].

If I enter an escaped colon (%3A) in a path segment, it will be kept as %3A by BaseX, rather than converted to the reserved character ':'.

The RESTXQ specification [2] doesn’t seem to contain detailed instructions on how to decode the submitted URI before extracting path parameters, therefore I think RFC 3986 should prevail.

So I agree, BaseX should not interpret escaped slashes as if they were regular slashes, thereby disallowing them as part of RESTXQ path pa

Gerrit

[1] https://tools.ietf.org/html/rfc3986#section-2.2
[2] http://exquery.github.io/exquery/exquery-restxq-specification/restxq-1.0-specification.html

On 24.01.2020 13:54, Ivan Kanakarakis wrote:
Hi Christian,

thanks for the quick reply. It definitely helps, but it still keeps
this behaviour in the "weird" domain.
I do not see a reason to be decoding the URI before it gets to match a
route. What is the reason for this?

What you propose works, but if I have a route like
"/search/{$query=.+}/page/{$page}", then the query will match
everything including "/page/...". If the path was not decoded, I do
not think I would need the regex, neither any other special operation
on the route. It should work with "/search/{$query}/page/{$page}" and
it should return "tea%2Ftime". Why do I have to make workarounds to
try to guess how a part of the URL was encoded, when the URL I hit has
that part encoded?
I don't think it makes sense, and I don't see a use case for this.

When the framework receives the payload, it is responsible to match a route.
By matching the route, it will provide me with the binded parts of the
route that I requested.
Then, *I* am responsible to decode those parts as I see fit and handle
the request as I need.

If the framework decodes the URL before matching a route, that is a
problem to me - I do not have the control I need.
If the framework decodes the URL parts before binding the route
variables, this is fine - it saves me an operation.

While, I now refactored the endpoint handlers to work with query
params, and this is no longer a problem for me, it is a problem in
general.


Cheers,



On Mon, 20 Jan 2020 at 19:36, Christian Grün <christian.gr...@gmail.com> wrote:

Hi Ivan,

A more common approach is to supply search terms as query parameters
(URL?query=...); in that case, your path won’t have new segments. If
you prefer paths, you can use a regular expression in your RESTXQ path
pattern [1]:

   "search/{$query=.+}"

In both cases, encodeURIComponent should be the appropriate function
to encode special characters.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/RESTXQ#Paths





On Mon, Jan 20, 2020 at 10:54 AM Ivan Kanakarakis
<ivan.kanak+basex-t...@gmail.com> wrote:

Hello everyone,

I am using BaseX 8.44 and the REST XQ interface (ie,
http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when
invoked with GET, it does a full text search (using "$db-nodes[text()
contains text { $term } all]"), gets the results, constructs a JSON
response and sends it back.

That's all fine and works great. However, I am not sure how I should
be doing the queries I describe bellow.

_Note: the query is initiated by a SPA javascript client, thus when I
say encode/uri-escape, what I mean is that I invoke the
encodeURIComponent function
(https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent).
_Note 2: for the sake of conversation let's consider the example
endpoint declared as:

     %rest:GET
     %rest:path("/search/{$term}")


1. I want to search for "tea". That is the basic query. A single term,
no problem.

     curl -s "https://example.com/search/tea";


2. I want to search for "tea time". Now, this query has a space in
between the two words. What I expect to get back, is any node that
contains both words (thus I have used "contains text" with "all"),
even if they may be a few words apart.
- Should I be sending an encoded/uri-escape version of this, ie, "tea%20time"?
- Or, should I be replacing the space with "+", ie "tea+time"?
- Or, some other advice?

     curl -s "https://example.com/search/tea%20time";
     curl -s "https://example.com/search/tea+time";


3. I want to search for "tea/time". This is even trickier. What I
expect to get back, is any node that contains "tea/time", ie a search
result for a single term. How do I do this?
- If I do not do anything, the slash is treated as part of the URL,
thus not matching a route.
- If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I
invoke the endpoint I get the same as if there was a slash.
- I am not sure how I should deal with the slash. How should I
escape/encode this?

     curl -s "https://example.com/search/tea/time";
     curl -s "https://example.com/search/tea%2Ftime";


Thank you,

Reply via email to