Hello Jody,

I have been looking into this, but I do not think that throwing an exception with absolute paths is a viable option. It is too problematic for backwards compatibility. For one it seems to completely break the "file chooser" in web-core, and it causes problems all around.

The problem is the confusion of the term "absolute path" which can mean two things: absolute for the file system or absolute for the resource store's structure. The thing is, it has always been so that the resource store "root" has been treated as a linux style root: "/". It is normal to write ResourceStore.get("/a/b/c") or Resources.fromURL("resource:/a/b/c"). It is even much more natural than to write this as a relative path. That is because it actually is not relative but absolute: absolute for the resource store, separate from the file system.

Also, checking for absolute paths would mean very different behaviour on windows than linux. On windows resource store style "/a/b/c/" paths would not be recognised as absolute and thus allowed, but they would be forbidden on linux. But the syntax of resource store paths is always the same, and is not dependent on the syntax of file system paths.

This is exactly what is so problematic with the 'theoryRootIsAbsoluteTest': it confirms whether a resource store path is absolute for the operating system file system. But as I said, the resource store path syntax does not depend on the operating system! So this is essentially a test that forces the resource store to be the file system, and not a custom implementation.

We need to get rid of this test and get back to the old test that confirms that "/a/b/c" and "a/b/c" are one and the same thing, since the resource store has no concept of either a "current directory" or a file system root. It has its own root and it always starts from there, for any path submitted to it. This is very clear and lenient at the same time.

Kind Regards

Niels

On 21/07/2023 16:06, Jody Garnett wrote:
I think we make the change, so resource store fails with absolute path, and learn what other areas of the code assume resource store can access any location on disk.

Then we will know how widespread this problems is.

I am happy to split up the work, and we can add to the documentation a very clear sentence about resource store not supporting absolute paths.

Jody

On Fri, Jul 21, 2023 at 4:01 AM Niels Charlier <ni...@scitus.be> wrote:

    Hey Jody,

    How do we proceed practically?

    I am happy to make a PR to fix resource store and make the agreed
    upon core changes.

    Will you change the docs or would you like this to be part of my work?

    Kind Regards

    Niels

    On 13/07/2023 16:19, Jody Garnett wrote:
    I already tried to catch a lot of those when adjusting for the
    windows absolute paths.

    We could add some logic and provide a warning if any absolute
    paths get through? And the call Resources.fromURL(base, url) …

    Still we may be getting into a technical detail here.  What needs
    to be changed or clarified; then we can adjust the code to march.
    On Thu, Jul 13, 2023 at 1:42 AM Niels Charlier <ni...@scitus.be>
    wrote:

        In theory I agree but there is always a risk to changing
        lenient behavior to throwing exceptions, if there is code or
        production configs that have relied on paths starting with
        '/' just passing as relative...

        On 11/07/2023 22:27, Jody Garnett wrote:
        Thanks:

        Reading the docs, it is /okay/ but not /great/.

        I think we we made it so the ResourceStore implementations
        fail when provided an absolute path it would be best; and
        remove the test case that is causing problem.

        That way only Resources.fromURL(base, url) would support
        "absolute" paths and life would be more clear?
        --
        Jody Garnett


        On Jul 9, 2023 at 10:53:22 PM, Niels Charlier
        <ni...@scitus.be> wrote:

        Hello Jody,

        My comment about deprecating the file() was in response to
        what Gabriel said about that.

        Resource: urls are indeed for instance supported by SLD, if
        you want to link to a resource inside your data directory /
        resource store. But I think they can be used about anywhere
        where file: or other URLs are supported. As the javadoc for
        Resources.fromURL says: relative file: URLs are deprecated,
        use resource: instead. I don't know why they were never
        documented.

        Kind Regards

        Niels

        On 07/07/2023 16:48, Jody Garnett wrote:
        Niels,

        I do not understand the comment about deprecating the
        file() method? I agree with everything you said about how
        it is used and useful.

        I am not sure when resource URLs were added, I thought
        they were only added to pass information over to geotools
        side of things for SLD validation and did not expect to
        see them used directly in any user configuration.  I guess
        we could talk resource: URL here in the developers docs
        explicitly. Do we know who wrote that (I hope it was not me).

        Still let's take this discussion and update the docs and API.
        --
        Jody Garnett


        On Jul 5, 2023 at 11:48:50 PM, Niels Charlier
        <ni...@scitus.be> wrote:

        Hello Gabriel and Jody,

        I agree with most of what Gabriel says, it is mainly the
        same point as mine except that I think it is unrealistic
        to deprecate the file() method. First of all we would
        need to to move the resource API to geotools because SLD
        files are stored in the resource store, but there are
        still tertiary tools that require files like template and
        image processors. The idea is that the file created on
        disk is only a cache for these tertiary tools and
        manipulating it will have no effect.

        Jody, most of the docs seem fine, except that the second
        blue box explanation for 'path' is confusing paths with
        URLs: 'file:/' URLs are not part of the resourcestore and
        resource rest API, and the other problems is that docs
        are missing any mention of the "resource:" URLs.  We must
        first of all distinguish between (1) resourcestore API
        and (2) Resources.fromURL method, I think there might be
        some confusion there. Then the distinction between file:
        and resource: urls need to be in the docs.

        The (1) Resourcestore does not deal with URLs or absolute
        paths, only relative paths. Internally, geoserver modules
        load their configuration files in the resource store
        directly from this API.

        The (2) Resources.fromURL deals with URLS and can handle
        both absolute file paths and resources from the
        resourcestore using the 'resource:' protocol prefix. This
        is for external use: the purpose is so that people can
        refer to both files on disk and resources in the resource
        store wherever they can specify a URL, for instance
        inside style files or in the parameters of a data store.
        File: URLs is for the entire file system, not the data
        directory and get 'converted' to a resource using the
        Files.asResource wrapping method. Resource: URLs refer to
        anything in the data directory or alternative resource store.

        As legacy from pre-resourcestore times, geoserver
        supports (supported?) 'file:' URLs relative to the data
        directory (this is a little bit 'dodgy' since 'file:'
        urls are standardized and do not paths without leading
        slash - https://en.wikipedia.org/wiki/File_URI_scheme ).
        Note that the Resources.fromURL javadoc says this is
        deprecated and should be replaced with resource: URLs.

        Kind Regards

        Niels

        On 05/07/2023 15:06, Jody Garnett wrote:
        Hey folks, I was tired and not in position to act
        clearly when these gaps were noticed last year.

        We can tighten up the api definition, at least with the
        changes made we can now notice when an absolute path was
        used.

        I never quite managed to communicate the separation
        between using a file URL and File access (when
        DataStores wish to access local files) as distinct from
        use of a resource for managing configuration files.

         I suspect you both (Niels and Gabe) have met this
        distinction first hand - but it is hard to explain the
        value.

        Documentation is as added here:
        
https://docs.geoserver.org/latest/en/developer/programming-guide/config/resource.html

        This could your review and input.

        Jody

        On Wed, Jul 5, 2023 at 2:34 PM Gabriel Roldan
        <gabriel.rol...@gmail.com> wrote:

            If "the codebase had drifted away from the intended
            use over time" I think it's even more important to
            stick to the contract and not the other way around.
            As far as I can see, there are two abstractions,
            ResourceStore and Resource, the former clearly says
            "This abstraction assumes a unix like file system,
            all paths are relative and use forward slash {@code
            /} as the separator",
            the latter "Resource used for configuration storage.".
            Only by adhering to this contract, we could provide
            alternative implementations.
            So if the changes were made to accommodate "common
            usage" on the specific case of a filesystem-based
            ResourceStore
            implementation, it should be the other way around,
            to find out usages that don't adhere to the spec and
            fix them.

            Concretely, I _think_ the only place where absolute
            URI's would be requested to the ResourceStore, is
            where a data, not configuration,
            file, is resolved, expecting the ResourceStore to be
            smart and resolve "file:data/roads.shp" relative to
            the datadir, and "file:/data/roads.shp"
            relative to the file system.
            Now, in doing so, we're asking the ResourceStore to
            do what it's not intended to. The test case Niels
            mentioned used to check that a leading "/"
            had no meaning for the resource store (i.e.
            /data/roads.shp == data/roads.shp), and that was
            changed to mean the opposite.
            So, IMHO, the responsibility of resolving files
            outside the datadir shouldn't be on ResourceStore,
            but on the client code. Something like:

            String path = ...
            File shp;
            if(Paths.isAbsolute(path))
              shp = new File(path);
            else
             shp = resourceStore.get(path).file();

            As a matter of fact, Resource.file():java.io.File
            should be deprecated and eventually removed.
            Resource is for config contents and
            has Resource.in():InputStream and
            Resource.out():OutputStream.

            It is hard enough already to adhere to lax interface
            contracts (catalog's default workspace hello) as to
            keep on making it more and more difficult.

            So what do we do? can we get to an agreement that
            the contract in the interfaces is mandatory and work
            from there?

            cheers,
            On Tue, 4 Jul 2023 at 17:49, Niels Charlier via
            Geoserver-devel
            <geoserver-devel@lists.sourceforge.net> wrote:

                Hello Jody,

                Yes, I admit it's my own fault for missing this
                discussion at the time.

                I think it would be a shame to let the codebase
                further drift away from the intended use of the
                resource store. We have done the work to make
                the entirety of geoserver generic with respect
                to the implementation of the data directory and
                it is only a minimal effort to keep it this way.
                Even though jdbc store is still a community
                module and the Redis based geoserver project has
                been discontinued, it has been proven that
                alternative implementations for the data
                directory work and the jdbc store module is 
                actually being used in production successfully.

                Now, interestingly, I have discovered that there
                are contradictions in how it works / is
                documented now.

                - I discovered that while 'theoryRootIsAbsolute'
                test is successful, it is actually very
                misleading. At least on linux, paths starting
                with '/' *still create a file relative to the
                data directory*!. The only difference is the
                string returned by the path() method. So while
                the path might seem absolute, the file you are
                accessing is not. So the path() method is
                misleading and in reality the leading slash is
                still being ignored (again, at least on linux).

                - Note that the javadoc of ResourceStore still says:

                This abstraction assumes a unix like file
                system, *all paths are relative*

                There it says it: all paths are relative for the
                resourcestore. Absolute paths have no meaning or
                function when it comes to the resource store.

                - I think this contradiction could be resolved
                in two possible ways:

                1) the test 'theoryIsRootSlashIsIgnored' should
                come back instead of 'theoryRootIsAbsolute'. The
                root slash is ignored and removed from the path.

                2) The resource store throws an exception when
                you ask for an absolute path.

                Either way, all absolute paths should be handled
                _outside of_ the ResourceStore, for instance by
                calling Files.asResource(). So problems with
                absolute paths in the rest service should be
                resolved in the rest service, or some other
                layer that is used by the rest service.

                Kind Regards
                Niels

                On 04/07/2023 21:59, Jody Garnett wrote:
                Hello Niels,

                The PRs for this activity contain extensive
                discussion.

                The fundamental issue was the handling of
                absolute paths which was done differently by
                different sections of code. Specifically we
                found that the REST API endpoint was allowing
                paths *//data* and */data *to both reference
                content in the data directory, rather than
                treating the first one as an absolute path. In
                response we tightened up the javadocs and added
                test cases including the one you mentioned.

                I agree that this goes against the goal of
                resource store, but the codebase had drifted
                away from the intended use over time.

                Now that you are present in the discussion it
                would be a good opportunity to discuss this
                with the parties involved.
                --
                Jody Garnett


                On Jul 3, 2023 at 2:50:02 PM, Niels Charlier
                <ni...@scitus.be> wrote:

                Hello Jody and others,

                I am having trouble understanding the changes
                that were made about 6 months ago to the
                ResourceStore's expected behaviour.

                In particular, in the class
                'org.geoserver.platform.resource.ResourceTheoryTest',
                the unit test 'theoryRootSlashIsIgnored' was
                replaced by 'theoryRootIsAbsolute'. I cannot
                make sense out of this theory test at all.


                 This seems to be entirely contradictory to
                the whole reason that the ResourceStore API
                was created, that is to make an abstraction of
                the /Data Directory/, so that it can be
                replaced by something else (such as jdbc store
                or other implementations that have been made).
                There was already support for absolute file
                paths in all circumstances by using "file:"
                URLs. This will bypass the resource store and
                call Files.asResource instead. But resource:
                URLs are for the data directory or alternative
                resource store only. How does it make sense to
                get absolute paths from the resource store?


                In order to make jdbc-config pass the tests, I
                will have to turn off this particular method.
                But why should the test even be there if the
                file resource store is the only one that could
                ever support it? Programmers and users will
                rely on this behaviour and support for all
                alternative implementations of ResourceStore
                will be broken. In this case we may as well do
                away with the API and just use the file system
                directly again.


                Kind Regards

                Niels

                _______________________________________________
                Geoserver-devel mailing list
                Geoserver-devel@lists.sourceforge.net
                https://lists.sourceforge.net/lists/listinfo/geoserver-devel




-- Gabriel Roldán

-- --
        Jody Garnett

-- --
    Jody Garnett

--
--
Jody Garnett
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Reply via email to