Jody,

You were asking me about the status of this. Point (1) of this email is definitely still a blocker.
Unless I missed something, I believe there are two possibilities:

(1) Change the ResourceStore API to add a getBaseDirectory() method. This way GeoServerResourceLoader doesn't have to call dir() any more for the base directory and force an immediate cashing of the whole data directory.

(2) Create a method GeoServerResourceLoader.getDataDirectory() that ignores whichever ResourceStore is used and gets the data directory directly from the Spring Context. In that case I think we must remove GeoServerResourceLoader.getBaseDirectory() method completely because it should _never_ be used if it is not implemented as above.

My preference would be for option 1.

Kind Regards
Niels


-------- Forwarded Message --------
Subject:        Re: Resources port
Date:   Tue, 27 Oct 2015 14:28:47 +0100
From:   Niels Charlier <ni...@scitus.be>
To:     Jody Garnett <jody.garn...@gmail.com>
CC: Gabriel Roldan <grol...@boundlessgeo.com>, Kevin Smith <ksm...@boundlessgeo.com>, Geoserver-devel <geoserver-devel@lists.sourceforge.net>



Jody,

Small addition. With respect to point (1), I know about GeoServerResourceLoader.lookupGeoServerDataDirectory(servletContext), but then we are bypassing the ResourceStore API and missing some of its generic purpose. The point is if we're making a GeoServerResourceLoader from a ResourceStore, should it not take the baseDirectory from it somehow.

Niels

On 27-10-15 14:21, Niels Charlier wrote:
Hello Jody,

Thanks for your email! That clarifies at least which direction we should be going with some of these issues. A few remaining important points:

1. Can you fill me in on a way to get the path to the DataDirectory without calling dir() ? I'll have to make a patch for that then, but I really did not see a way to do that in the current API, if you are working with a ResourceStore. See the constructor GeoServerResourceLoader(ResourceStore resourceStore). We'll have to change the resourcestore API to make this possible, no?

2. The problem with the GEOSERVER_DATA_DIRECTORY/data directory, or any other raster/vector data is slightly more complicated than you think. * The REST api uploads both configuration files as well as data files, and it uses the same methods for both. I converted the whole module to use resources instead of files. This results (for now) in data files being uploaded to the database and then cached when the store is created. * The distinction is not always simple to make, app-schema has configuration files (usually located in the workspaces dir) that are threated by geoserver in the same way as data files and they are read by geotools.

Is there a reason why using the database to store and distribute the data files is not recommended, is it a matter performance/space?

Otherwise, indeed I would recommend allowing the user specify in the jdbcstore configuration file which dirs to ignore. The jdbcstore would ignore with import as well as return a filebasedresource when these folders are being queried. Does that sound good?

3. I like the idea of deleting the data directory after import. But then point (1) _absolutely_ needs to be resolved, because otherwise the data directory will immediately be cached completely, repeatedly.

4. In my opinion, dir() should _always_ be avoided. I would recommend using resources as much as possible and as long as possible and only cache when absolutely necessary (usually a 3rd party lib), which means ery dir() is rarely necessary but file() is sufficient. The issue with the usage of dir() is that it could encourage people to use the file system directly, forgetting that changes to the file system have no lasting effect when using the jdbcstore!

Kind Regards
Niels


On 26-10-15 22:28, Jody Garnett wrote:
Thanks Niels, some comments inline, assume this is for GSIP-132 <https://github.com/geoserver/geoserver/wiki/GSIP-132> (unless that is completed already).

On 7 October 2015 at 05:28, Niels Charlier <ni...@scitus.be <mailto:ni...@scitus.be>> wrote:

    Hi Jody, Gabriel, Kevin

    I have been  porting all modules to use the resources system
    consistently and only use files when necessary (usually external
    library). I still stumbled upon two minor questions/issues I
    wanted to discuss.

    1. Usage of the "data" directory. At the moment the import from
    data directory -> jdbc store ignores the "data" directory. In a
    clustered environment, this directory thus remains instance
    specific, and it would be up to the user to refer to shared files.


Are you talking about GEOSERVER_DATA_DIRECTORY/data? If so that is only a convention, I have made data directories that used "raster" and "vector" folders for example.

For storing spatial data (GeoTIFF, Shapefile, Image Mosaic here) I had the idea of doing something like JNDI but for referencing an external folder used for this purpose. This could both provide an "ignore" list (so "data" was not hard coded) and allow for a cluster with RAID storage mapped to a specific mount.

    At this moment, there is no reason why we couldn't include the
    data dir in the jdbcstore and cache it before loading the
    geotools datastore. This is actually what my modified version of
the rest service already does because it uses resources everywhere.

For configuration files this is what we want.

    Another idea, was to program the jdbcstore to return file based
    resources only when the "data" directory is used, so that it
    definitely will never store those files in the database
    unnecessarily.


Okay pretty sure you are talking about GEOSERVER_DATA_DIRECTORY/data now.

Q: Is it worth removing the files that have been imported into JDBCConfig from GEOSERVER_DATA_DIRECTORY? This would prevent confusion, and allow GEOSERVER_DATA_DIRECTORY to work strictly as a cache (for the few things that require a file to be unpacked on to disk).

    2. In the jdbcstore, should the children of a directory be cached
    when dir() is called?


Cached is on import (so yes). Should the resources be unpacked (staged) to the file system when dir() is called? Yes

    The DataDirectory class uses the dir() method to know the root of
    the data directory, causing the whole data directory to be cached
    at once multiple times unnecessarily, since the root dir is
    usually requested just to know the path for some reason (all code
    where it actually needs files in the data dir, have been replaced
    by resources).


This is a bug, such logic should be replaced. There is another method to get the root of the GEOSERVER_DATA_DIRECTORY. While we may hard code some things now it would be wide to have an extension point for modules (such as geowebcache) to mark off working directories that should not be cached.

Using dir() to determine the root of GEOSERVER_DATA_DIRECTORY is a bad idea, in addition to breaking the design of dir() we are trying to avoid duplicating code contain data directory structure logic.

    We now always want to use resources as long as possible, only
    calling file() at the last moment if necessary. As a consequence
    the dir() method is actually hardly used for the purpose or
    getting all the files inside that dir. I would suggest on calling
    dir() only to create the dir if it doesn't exist yet and not
    cache its children. There is only one part of code left where
    that would pose a problem, the community module "validation",
    which passes on a whole dir to its geotools counterpart. This
    however could be changed in the geotools module to pass on a
    collection of files instead.


Validation only needed one "validation" folder, so that code could be changed to use resource("validation").dir().

    After this change, I wonder if we should make a doc page on the
    proper practices of using the Resource API in order to be
    clustering-safe.


Yep, could add to the developers guide under "file access".




------------------------------------------------------------------------------
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Reply via email to