we can figure out the workflow details later. What I'd want is for us to lay down a clear scope, something with can easily track progress against. How should we proceed? use cases -> feature specs -> tech spec?
Gabriel On 6/25/10 1:56 PM, David Winslow wrote: > We don't have a build of GeoWebCache in the GeoNode source tree right > now so if we are going to start customizing it we will need more than a > branch. Gabriel, let me know if you think we need to do something about > this. > > -- > David Winslow > OpenGeo - http://opengeo.org/ > > On 06/25/2010 12:51 PM, Sebastian Benthall wrote: >> Chris, correct me if I'm wrong, but we go the goahead to work on this, >> yeah? >> >> I guess that means we should have a new git branch for this? >> >> On Wed, Jun 23, 2010 at 2:41 PM, Gabriel Roldan <[email protected] >> <mailto:[email protected]>> wrote: >> >> This is gonna be awesome. Some comments inline. >> >> On 6/23/10 12:08 PM, David Winslow wrote: >> > shooting from the hip with some feedback on these ideas >> > >> > On 06/22/2010 06:18 PM, Chris Holmes wrote: >> >> I've been thinking a bit about how we can bring GeoWebCache in to >> >> GeoNode, to get at some of the great performance enhancements >> it can >> >> bring. Ideally we seamlessly cache all layers viewed in >> GeoNode, both >> >> local and remote, even when those change. There are twists >> with each, >> >> and both revolve around stale caches. >> >> >> >> With local caches we need a way to truncate the cache if the style >> >> changes. Ideally when one is in style edit mode we don't use >> GWC at >> >> all, only when someone does a final 'save' does it start >> caching the >> >> change. >> >> >> >> With remote caches we need a way for a user to manage the cache, to >> >> invalidate it when the remote server changes, either data or style. >> >> Ideally it would have a GeoRSS feed of changes that GWC >> automatically >> >> truncates based on. Less ideally there's a manual way to >> restart the >> >> caching. >> >> >> >> A rough roadmap of how we might achieve the end goal: >> >> >> >> * Start with just caching remote layers. So when anyone puts in a >> >> remote WMS it automatically gets added as a GWC layer. >> The GWC REST API is definitely clunky currently and would highly >> benefit >> if we do this >> >> >> Gabriel is about >> >> to commit a Least Recently Used cache to GWC, which will allow >> an admin >> >> to set a total max for the cache. >> Right now the diskquota is an opt-in process meaning there's no global >> cache size cap, but you need to set the limit on a layer by layer >> basis. >> I think it would be easy to add a global limit so any non explicitly >> configured layer gets evenly capped to cope up with the global limit. >> How does that sound? >> >> So we could let people add any layer, >> >> but the admin of GeoNode can configure it to just cache the >> most used >> >> tiles, up to a limit they set, be it 100 megs or 2 terabytes. >> For this >> >> first step the caches may just get invalid, but the admin would >> have the >> >> ability to truncate them in the GWC admin. >> >> >> > LRU worries me a bit; if we set the disk limit too low we may >> just end >> > up with a lot of cache churn for little/no performance benefit. >> In my mind, GeoWebcache is "incomplete" as a product until we add the >> following enhancements: >> - configuration option to cache layers only up to a certain zoom >> level, and from that level on, defer to pure proxy mode >> - diskquota, which is kind of in beta testing now >> - Identify and avoid seeding empty tiles. This can be easily done >> with >> the JAI Extrema operation (or even Histogram) or the user might >> configure a no-data color for the layer? >> - Definition of an area of interest, so that a geometry defines the >> allowed seeding area for a layer >> >> And the >> > disk requirements can grow with minimal warning, since anyone >> can add a >> > layer. There's also an easy DOS attack - anyone can fetch 18 >> zoom levels >> > of some layer nobody uses and trash the cache (not a huge deal, >> how long >> > would it take an attacker to do that anyway?). >> That would put the LRU diskquota enforcement job to work and hence >> wipe >> out those tiles that are least used. This plus the ability to set a >> limit on the number of zoom levels to actually cache would bring us >> closer to the safe zone? >> >> I'm not saying an LRU is >> > a bad idea. I think caching will be a great improvement. It's just >> > that there is a lot of room for refinement here (probably once >> we have >> > better usage tracking we can use that to prioritize tilesets, >> for example.) >> Wouldn't the LRU stats be enough for that? Note we also have an LFU >> (Least Frequently Used) expiration policy for diskquota enforcement, >> which looks closer to the kind of usage tracking you mention? >> > >> >> * Cache local layers, coordinating with Style changes. I think >> Arne may >> >> have coded this up, at least for the embedded GWC. >> Yes. The problem with the embedded GWC is that is completely wipes out >> the entire layer cache upon _any_ modification, including WFS >> transactions, resulting too heavily truncated caches. You >> add/remote/edit a single feature, the whole layer cache is discarded. >> There's room to improve that based on bounding box/bounding >> polygon with >> some stuff created for the GeoRSS module though. >> We could perhaps >> >> start with just doing the cache on the embedded maps, since >> those won't >> >> have people switching to 'style mode'. Maybe that intermediary >> step >> >> isn't necessary, but when we're in the map composer view we >> want to be >> >> sure that when people are styling they're not seeing GWC tiles. >> Related: I've been wondering since some time now if it wouldn't make >> sense to also integrate the WMS service endpoints for WMS and GWC, >> like >> in GWC being a front barrier for /geosever/wms instead of having to >> explicitly go through /geoserver/gwc?service=WMS... >> >> Back to topic: couldn't the styles just use a CGI flag to indicate >> when >> to ignore the cache and go straight to the WMS? AFAIK tiled=false >> would >> make the trick. >> >> When >> >> they finish styling we should then truncate the existing cache >> and start >> >> over. Another simplifying assumption we could also consider >> making is >> >> only cache on the default style. Not sure how much that >> actually helps. >> >> >> > I don't think we need to avoid caching alternative styles. >> I think right now GWC only seeds on the default style, and lazily >> caches >> non default styles. Are we talking about preseeding here or just lazy >> cacheing? >> >> > >> > I do think we need to skip the cache while editing styles. >> > >> > It would be nice if we could use cached layers everywhere, and >> have only >> > the layer being styled switch to "straight" WMS when styling is >> active. >> >> * Remote layer management. This is sort of more general, I >> think in the >> >> future we should figure out some more full representation in each >> >> GeoNode of a remote layer. Right now remote layers can be >> added, but no >> >> metadata can be found out about it. This is another whole >> topic, but >> >> the implication for here is that such a page should/could have >> a way to >> >> manage the cache of the local GeoNode. So you could truncate >> the cache >> >> there (maybe just the person who added? Maybe you can set >> permissions >> >> of who can truncate?). And then possibly also add a GeoRSS >> location to >> >> automatically truncate from. >> >> >> > Yeah, it would be awesome if adding a WMS to the composer >> application >> > got that service added to the GeoNode's GeoNetwork index, >> complete with >> > metadata pages in the Django web app. And GeoNode can >> periodically scan >> > the capabilities for added/removed layers, updated descriptions, new >> > styles. These would be reflected in GeoNetwork and GeoWebCache >> as well >> > as the Django database. >> > >> > It might be nice to also provide a listing of indexed services >> so users >> > can track down the originating WMS services if they want. >> >> The cool thing this set of things should lead to is to give a >> benefit to >> >> people adding remote servers. They get increased speed and >> reliability >> >> if they just add it to a map on a geoNode. So we can come in >> with a >> >> GeoNode to an existing nice SDI implementation that already has >> a bunch >> >> of WMS services, and then people can start creating maps on top >> of it, >> >> and those maps perform even faster than the straight WMS. >> >> >> >> Thoughts? I think this could be a nice performance win, as >> most all our >> >> maps are tiled. Should obviously be complemented by other >> >> optimizations, like on the javascript side, but the two >> together should >> >> make things quite zippy. >> >> >> > Having the WMS capabilities handled on the server side (and cached >> > there) would probably be a nice win for loading services. We >> could do >> > away with reading capabilities entirely until the user pulls up >> the add >> > layers dialog (which is not available in the embedded viewers). >> > >> > We don't do GFI requests now but it might be worth thinking >> about how >> > they interact with the cache. I also don't see this map caching >> doing >> > much for offline/distributed data management, which seems like >> caching >> > of another sort. It would be good to work out some answers >> related to that. >> I don't get it. Could you elaborate? >> >> Cheers, >> Gabriel >> > >> > -- >> > David Winslow >> > OpenGeo - http://opengeo.org/ >> >> >> -- >> Gabriel Roldan >> OpenGeo - http://opengeo.org >> Expert service straight from the developers. >> >> >> >> >> -- >> Sebastian Benthall >> OpenGeo - http://opengeo.org >> > > -- Gabriel Roldan OpenGeo - http://opengeo.org Expert service straight from the developers.
