Chris, correct me if I'm wrong, but we go the goahead to work on this, yeah?
I guess that means we should have a new git branch for this? On Wed, Jun 23, 2010 at 2:41 PM, Gabriel Roldan <[email protected]> wrote: > This is gonna be awesome. Some comments inline. > > On 6/23/10 12:08 PM, David Winslow wrote: > > shooting from the hip with some feedback on these ideas > > > > On 06/22/2010 06:18 PM, Chris Holmes wrote: > >> I've been thinking a bit about how we can bring GeoWebCache in to > >> GeoNode, to get at some of the great performance enhancements it can > >> bring. Ideally we seamlessly cache all layers viewed in GeoNode, both > >> local and remote, even when those change. There are twists with each, > >> and both revolve around stale caches. > >> > >> With local caches we need a way to truncate the cache if the style > >> changes. Ideally when one is in style edit mode we don't use GWC at > >> all, only when someone does a final 'save' does it start caching the > >> change. > >> > >> With remote caches we need a way for a user to manage the cache, to > >> invalidate it when the remote server changes, either data or style. > >> Ideally it would have a GeoRSS feed of changes that GWC automatically > >> truncates based on. Less ideally there's a manual way to restart the > >> caching. > >> > >> A rough roadmap of how we might achieve the end goal: > >> > >> * Start with just caching remote layers. So when anyone puts in a > >> remote WMS it automatically gets added as a GWC layer. > The GWC REST API is definitely clunky currently and would highly benefit > if we do this > > >> Gabriel is about > >> to commit a Least Recently Used cache to GWC, which will allow an admin > >> to set a total max for the cache. > Right now the diskquota is an opt-in process meaning there's no global > cache size cap, but you need to set the limit on a layer by layer basis. > I think it would be easy to add a global limit so any non explicitly > configured layer gets evenly capped to cope up with the global limit. > How does that sound? > > So we could let people add any layer, > >> but the admin of GeoNode can configure it to just cache the most used > >> tiles, up to a limit they set, be it 100 megs or 2 terabytes. For this > >> first step the caches may just get invalid, but the admin would have the > >> ability to truncate them in the GWC admin. > >> > > LRU worries me a bit; if we set the disk limit too low we may just end > > up with a lot of cache churn for little/no performance benefit. > In my mind, GeoWebcache is "incomplete" as a product until we add the > following enhancements: > - configuration option to cache layers only up to a certain zoom > level, and from that level on, defer to pure proxy mode > - diskquota, which is kind of in beta testing now > - Identify and avoid seeding empty tiles. This can be easily done with > the JAI Extrema operation (or even Histogram) or the user might > configure a no-data color for the layer? > - Definition of an area of interest, so that a geometry defines the > allowed seeding area for a layer > > And the > > disk requirements can grow with minimal warning, since anyone can add a > > layer. There's also an easy DOS attack - anyone can fetch 18 zoom levels > > of some layer nobody uses and trash the cache (not a huge deal, how long > > would it take an attacker to do that anyway?). > That would put the LRU diskquota enforcement job to work and hence wipe > out those tiles that are least used. This plus the ability to set a > limit on the number of zoom levels to actually cache would bring us > closer to the safe zone? > > I'm not saying an LRU is > > a bad idea. I think caching will be a great improvement. It's just > > that there is a lot of room for refinement here (probably once we have > > better usage tracking we can use that to prioritize tilesets, for > example.) > Wouldn't the LRU stats be enough for that? Note we also have an LFU > (Least Frequently Used) expiration policy for diskquota enforcement, > which looks closer to the kind of usage tracking you mention? > > > >> * Cache local layers, coordinating with Style changes. I think Arne may > >> have coded this up, at least for the embedded GWC. > Yes. The problem with the embedded GWC is that is completely wipes out > the entire layer cache upon _any_ modification, including WFS > transactions, resulting too heavily truncated caches. You > add/remote/edit a single feature, the whole layer cache is discarded. > There's room to improve that based on bounding box/bounding polygon with > some stuff created for the GeoRSS module though. > We could perhaps > >> start with just doing the cache on the embedded maps, since those won't > >> have people switching to 'style mode'. Maybe that intermediary step > >> isn't necessary, but when we're in the map composer view we want to be > >> sure that when people are styling they're not seeing GWC tiles. > Related: I've been wondering since some time now if it wouldn't make > sense to also integrate the WMS service endpoints for WMS and GWC, like > in GWC being a front barrier for /geosever/wms instead of having to > explicitly go through /geoserver/gwc?service=WMS... > > Back to topic: couldn't the styles just use a CGI flag to indicate when > to ignore the cache and go straight to the WMS? AFAIK tiled=false would > make the trick. > > When > >> they finish styling we should then truncate the existing cache and start > >> over. Another simplifying assumption we could also consider making is > >> only cache on the default style. Not sure how much that actually helps. > >> > > I don't think we need to avoid caching alternative styles. > I think right now GWC only seeds on the default style, and lazily caches > non default styles. Are we talking about preseeding here or just lazy > cacheing? > > > > > I do think we need to skip the cache while editing styles. > > > > It would be nice if we could use cached layers everywhere, and have only > > the layer being styled switch to "straight" WMS when styling is active. > >> * Remote layer management. This is sort of more general, I think in the > >> future we should figure out some more full representation in each > >> GeoNode of a remote layer. Right now remote layers can be added, but no > >> metadata can be found out about it. This is another whole topic, but > >> the implication for here is that such a page should/could have a way to > >> manage the cache of the local GeoNode. So you could truncate the cache > >> there (maybe just the person who added? Maybe you can set permissions > >> of who can truncate?). And then possibly also add a GeoRSS location to > >> automatically truncate from. > >> > > Yeah, it would be awesome if adding a WMS to the composer application > > got that service added to the GeoNode's GeoNetwork index, complete with > > metadata pages in the Django web app. And GeoNode can periodically scan > > the capabilities for added/removed layers, updated descriptions, new > > styles. These would be reflected in GeoNetwork and GeoWebCache as well > > as the Django database. > > > > It might be nice to also provide a listing of indexed services so users > > can track down the originating WMS services if they want. > >> The cool thing this set of things should lead to is to give a benefit to > >> people adding remote servers. They get increased speed and reliability > >> if they just add it to a map on a geoNode. So we can come in with a > >> GeoNode to an existing nice SDI implementation that already has a bunch > >> of WMS services, and then people can start creating maps on top of it, > >> and those maps perform even faster than the straight WMS. > >> > >> Thoughts? I think this could be a nice performance win, as most all our > >> maps are tiled. Should obviously be complemented by other > >> optimizations, like on the javascript side, but the two together should > >> make things quite zippy. > >> > > Having the WMS capabilities handled on the server side (and cached > > there) would probably be a nice win for loading services. We could do > > away with reading capabilities entirely until the user pulls up the add > > layers dialog (which is not available in the embedded viewers). > > > > We don't do GFI requests now but it might be worth thinking about how > > they interact with the cache. I also don't see this map caching doing > > much for offline/distributed data management, which seems like caching > > of another sort. It would be good to work out some answers related to > that. > I don't get it. Could you elaborate? > > Cheers, > Gabriel > > > > -- > > David Winslow > > OpenGeo - http://opengeo.org/ > > > -- > Gabriel Roldan > OpenGeo - http://opengeo.org > Expert service straight from the developers. > -- Sebastian Benthall OpenGeo - http://opengeo.org
