Re: [Geotools-devel] Versioning WFS-T and protocol extensions

Andrea Aime Wed, 29 Nov 2006 01:48:41 -0800

Jody Garnett ha scritto:
> Hi Andrea,
...
> You are clear on your scope (and yes everyone's hopes ask for more, but 
> I respect your decision to start small).
> 
> Datastore Desgin:
> 
> Data table:
> - I was not going to assume the revisionCreated - revisionExpired 
> columns; instead used to a single "revision" column that only becomes 
> non 0 when it is replaced (so revision==0 always represents the "live" 
> data). Having two columns is not bad, does having both help you ask for 
> data in a specific range? Or could we get by with just a single column.


I could get by with a single column but at high performance price.
Let's assume instead of 0 I do use maxint, since it simplifies queries (
if I used 0, the query below would need a special case to handle 
features that do not have a min for the subquery...)
In order to extract what was live at version x I would have
to query for record with the lowest revision number bigger than x,
something like:

select *
from data d1
where d1.revision = (
   select min(revision)
   from data d2
   where d2.id = d1.id
   and d2.revision > x
)

which is a lot slower than the single table access I propose.

I'm doing performance tests now, to see how much performance I give
up by using my schema, especially on extracting the last revision, which
is the most common operation anyways.
I'll keep you posted on the results.

> GetFeature
> - so out of the box it returns the latest
> 
> I am a bit concerned that making the revision columns available messes 
> up the origional schema (this simply will not work in the case where the 
> schema is provided by a third party authority for example). Although 
> this is not your use case (I recognize that) I am going to work through 
> how it can be done:
> 
> Use getFeature with a vendor specific parameter describing the revision 
> range.
> 
> The result of which is a GML document where the revision is either:
> - part of the feature identifier
> <Feature fid="people.fred.432456">..</Feature><Feature 
> fid="people.wilma.432455">...</Feature>
> - separate attribute on each feature (ie not element)
> <Feature fid="people.fred" revision="432456">..</Feature><Feature 
> fid="people.wilma" revision="432455">...</Feature>  (preferred approach?)
> 
> By making the concept of revision available as an attribute the normal 
> describeFeatureType method can provide the correct description as part 
> of the schema - ie the exact revision range that will work.

Hmm... I hear ya, yet there are downsides:
* I would no more be able to query the feature type for a specific 
revision using a plain GetFeature. This could be done in a 
GetFeatureVersioning extra method instead (something we are thinking 
about anyways to expand what we can ask a version based system), but
forces an API change in the version datastore as well (since what I'm
looking for is not in the gt2 filters anymore, unless we expand the
filter and expression sets to cope... hum, what would you do in this
regard?).
* the first approach would make it hard to build a checkout, how do
   you know how to parse the revision out of the identifier?
* the second approach would require for a different GML producer, and
   for a place in DefaultFeature to describe the revision (there's
   none at the moment).

So, I'm really wondering, if the schema is mandated by an external 
authority, could we avoid messing with versioning and use the complex
data store instead to get the same result?

> GetLog
> - making it available as a normal feature is fine, collections support 
> can be done if you need it.

Indeed, it's true... I just don't have any idea of how complex that 
would be and which GML producer we would need to use...

> Transation:
> - throwing errors out of Transaction is cool; consider any conflict to 
> be the same as a locking conflict (ie the modification has been made by 
> another so that feature is "locked")
> - leave revision columns out of the describe feature type so that you do 
> not have to worry about user's supplying the details...

See above, I would like to avoid that.

> The Transaction "handle" is where your changelog message comes from. No 
> additional extra attribute is needed from the Transaction element.

Did not thought about it, but this would be a way to bend the 
specification... The WFS 1.1, which is commented, says:

The handle attribute allows a client application
to assign a client-generated request identifier
to a WFS request.  The handle is included to
facilitate error reporting.  A WFS may report the
handle in an exception report to identify the
offending request or action.  If the handle is not
present, then the WFS may employ other means to
localize the error (e.g. line numbers).

Forcing handle to be used as a commit message would be wrong in my 
opinion...


> GetDiff - GetFeature option
> Good summary; the allowing each attribute to be optional is tough. One 
> way to consider this is the WFS1.1 idea of different formats; define a 
> "diff" format that produces an xml document such as you describe; you 
> can always include the modified XSD information incline as part of the 
> document (since in this case the amount of data is small, and the 
> modifications to the original XSD are known).

Yeah, indeed that's what I was thinking (but failed to communicate :-) )

> GetDiff - GetTransaction Request option
> It would be *nice* if the result was *not* a GetFeatures extensions but 
> instead the exact Transaction request documented required to make the 
> change; no messing around or inventing new xml schema is requried here.
> 
> This is consistent with Galdos cascading WFS-T approach and would be a 
> *great* benefit for keeping servers in sync.
> (Please consider this idea).
> 
> Rollback - don't do it, use GetDiff
> 
> No comment on the SQL - as I am out of time. Except that after we 
> prototype it would be smart to roll this stuff into the database side; 
> although not as Paul suggests straight into PostGIS (smarter to do it as 
> part of DataStore initialization, so the java code "owns" the SQL).

I have the same objection as with Paul's... this would turn into a
maintenance nightmare. I do really want this to be database independent
so that I don't have to fix bugs three times in three different database 
languages (that's why I'm using really plain queries too, they are
standard SQL supported by every database I know, besides maybe old
versions of Mysql that did not support sub-queries).

When a customer chooses a db, you don't have any way to make him change
his mind. See it with geoserver, people do complain about Oracle 
datastore, but this does not make them switch to Postgis, they simply
cannot. It's easier for them to drop Geoserver than Oracle.

So, if someone likes the idea and wants to do a Postgis integrated thing
that can be reused from PHP, python and whatnot clients,
then he should step out, do it, and maintain it too, because once you do
that, implementations do drift apart and each starts doing its own
(it's just a matter of time).

Oh, having a Postgres integrated vernsioning which would be great, btw, 
I would not be happy myself if postgis would haven been a middleware 
extension written in python.

Cheers
Andrea

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Re: [Geotools-devel] Versioning WFS-T and protocol extensions

Reply via email to