Re: [Geoserver-devel] Some ideas on the WPS module

Andrea Aime Wed, 08 Oct 2008 01:38:29 -0700

Jody Garnett ha scritto:
> Hi Andrea; just a quick follow up ... I viewed the priorities as:
> - feature collection (where most of the fun is in including the schema 
> used by the feature collection)
> - support for the store/status use case (where content is uploaded to an 
> FTP site; and you can check on the status of along running application)


I share the priority of feature collection support, but I definitely 
won't  implement upload on an FTP server as part of store, the
specification just says that you have to give back an URL at which you
can access the result.
Which from where I stand means just putting the GML file in the
data directory and provide an access to it thru a URL, just as we
do for WCS store support. Also, the file is only temporary store there,
and will be removed after a timeout (that, I fear, for the moment is
not configurable... waiting for the new UI to be completed to
add that config option).

> Other comments inline.
>> - the transmuters API seems overkill, there is a need to create a
>>    class per type handled, class level javadoc missing.
>>    Justin has implemented on his notebook a tentative replacement that
>>    does the same job as the existing 12 classes in 3 simple classes
>>    and still leaves the door open to handle raster data (whilst the
>>    current complex data handler simply assumes data to be parsed is XML,
>>    but the input could be anything such as a geotiff or a compressed
>>    shapefile)
>>   
> What is the transmuters API you are talking about here?

The one contained in the org.geoserver.wps.transmute in the wps module.

>> AVOID THE MID MAN IF POSSIBLE
>>   
> This was a nice to have; your idea seems fine. Please keep in mind the 
> "simple chaining" examples from the WPS spec.
>> SCALE UP WITH REAL REMOTE REQUESTS
>> If the request is really remote, we have to be prepared and parse 
>> whatever is coming in.
>>   
> We may need to make a new kind of api here - for the code that is in the 
> uDig project. I am thinking along the lines of Jesse's "provider" 
> classes. So we could have a FeatureCollectionProvider that is passed 
> into a Process; if its answer is a FeatureCollection perhaps it could 
> give us a FeatureCollectionProvider which we could hook up to the next 
> process in the "chain"? If you wanted to "wrap" this something that 
> would lazily save the contents to memory or disk in order to prevent 
> "reprocessing" that would be transparent? The providers could be 
> considered the "edges" and the processes the nodes in a graph producing 
> the answer.

I need some help in understanding the gain here.
Generally speaking we want the processes to depend on
some well known input and output types to ensure that it's possible
to easily build chains. In my mind those would be FeatureCollection
and Coverage. Good performing chaining should allow for middle
man avoidance. What I proposed in fact looks like a fc provider managed
by geoserver, with only three possible behaviours:
- provide the collection as is
- store the collection in memory (as a stop gap measure, don't want to
   keep this around in the long term as it's obviously a scalability
   killer)
- store the collection on disk
The logic used to apply the first or the third option (or the second
up until the third is available) is simply to look into the
streaming requirements of the process that will use it during
chaining. The processes would know nothing about providers, adding
that would just increase the difficulty of implementing one, generally
speaking if something has to deal with providers, it should be
whatever orchestrates the execution of a chain of processes.

If you foresee a pluggable, extensible api, you need to create
extension points both for the provider themselves, and for the logic
it takes to decide which one to provide. That sounds like a lot of
work for something that I'm not really excited about, but I may be
mistaken. Can you provide more details?

>> streaming parser and just scan over the remote input stream once.
>> We could create a marker interface to identify those processes and act
>> consequently.
>>   
> The processes have a data structures describing their parameter 
> requirements; it includes a Map that you can use for hints like what you 
> describe. So you could have a process that expects a feature collection 
> and you could include a key in the metadata map that is true if the 
> feature collection will be used more than once.

Yeah, that works better, this way a process can have multiple fc inputs
and say that only some of them will be scanned just once (think of
an inner loop like processing, outer collection scanned just once,
inner collection scanned once for each feature of the outer one).
I like it.

>> STORE=TRUE
>> For long running processes it makes lots of sense to actually
>> support storing. Yet just storing the result is kind of unsatisfactory,
>> my guess is that most of the client code would be interested in
>> being able to access the results of the computation using WMS/WFS.
>>   
> This is not documented in the spec; they refer to making results 
> available on FTP sites. 

The FTP server is just an example of how the behaviour could be 
implemented, nowhere it is said that it has to be that.
The specification says that if an output as been requested to be
published "asReference" then "it should be stored by the process as a 
web-accessible resource", meaning you have to just provide a URL
allowing the resource to be retrieve, without enforcing a specific
protocol or location.

> I could see doing this
> for processes that are scheduled to occur at a set time (make a "daily" 
> layer for example). However why not
> make a simple "publish" process - and people can use that at the end of 
> their chain if they want the result made
> available via WFS.

Good idea! This allows to cherry pick which one of the outputs of
a complex process can be stored on the server.

>> Well, that's it. Thoughts?
> The per session stuff is only okay; the specification provides some 
> guidance about how long a result will be available and so forth.

Then I must have missed something in the spec. Can you clarify how an
interactive client can perform processes and then access the
outputs with WMS/WFS, without the other users seeing them, and without
requiring permanent storage?
Let me draw a simple use case that I've seen in action at FOSS4G: an OL
client allows a user to select an origin and a destination on a road
network, a process is invoked to compute it, the result is then drawn
on OL. Suppose the result can be so big that it's not a good idea to
just return the GML (just change the kind of computation and you'll
easily find an example that can kill OL vector drawing abilities).
How would you handle it without using per session catalog storage?

Cheers
Andrea

-- 
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Geoserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Re: [Geoserver-devel] Some ideas on the WPS module

Reply via email to