Well, equality here was meant as a stand-in for "compatibility" - a
reasonable expectation that copying the uploaded features into the
pre-existing store is not going to cause problems.  As I was looking into
it I realized I was writing a lot of code, hence the question about whether
this sort of check is already implemented somewhere that I should be taking
advantage of it.

The "pull" approach you suggest seems to be actually modifying the data,
which isn't where I was thinking things would go.  I can definitely see
some utility in adjusting field types (in safe ways - widening integers,
etc.) but I think throwing out unexpected fields is going a bit too far.
 In the extreme, if I upload a layer to the wrong table entirely and NO
fields are common between the source and target schema, wouldn't the
approach you advocate result in a bunch of rows with all fields set to NULL
being inserted into the target table?  That doesn't seem to me like it
would be a good default behavior.  Maybe if we take the "pull" approach
without this condition: "any extra attributes in the source that don't
exist in the destination should also be ignored" it would be less likely to
insert junk data.  Put succinctly, the uploaded data would need to contain
a subset of the fields in the target (and omitted fields would default to
NULL.)

--
David Winslow
OpenGeo - http://opengeo.org/

On Mon, Apr 30, 2012 at 8:36 PM, Justin Deoliveira <[email protected]>wrote:

>
>
> On Mon, Apr 30, 2012 at 10:15 PM, David Winslow <[email protected]>wrote:
>
>> Looking deeper into identifying conflicts ahead of time, it seems that
>> the "update=append" option makes things a bit more complicated - we really
>> should avoid appending when the schemas for the source and target store
>> don't match up (typename differences should be ok though.)  I tried to
>> implement a schema equality check ignoring the name by simply overwriting
>> the name on one of the featuretypes:
>>
>>                             SimpleFeatureType sourceSchema =
>>> sourceDataStore.getSchema(featureTypeName);
>>>                             SimpleFeatureType targetSchema =
>>> (SimpleFeatureType)((FeatureTypeInfo)resource).getFeatureType();
>>>                             SimpleFeatureTypeBuilder ftBuilder = new
>>> SimpleFeatureTypeBuilder();
>>>                             ftBuilder.init(sourceSchema);
>>>                             ftBuilder.setName(targetSchema.getName());
>>>                             sourceSchema = ftBuilder.buildFeatureType();
>>>                             sameSchema =
>>> sourceSchema.equals(targetSchema);
>>
>>
>> However, I'm still not getting the expected value (true) from this for a
>> Shapefile that I'm attempting to upload multiple times.
>>
>> Am I barking up the wrong tree? Is there a GeoTools method I should be
>> using instead of trying to roll my own?
>>
>
> Well... i am not sure I 100% agree that the source and target have to
> match up exactly, especially given that specific format differences might
> lead to situations where this is unwanted. For example, consider oracle.
> Unless you have a spatial index on a column I believe oracle will simply
> return "GEOMETRY" as the type. But say you are uploading a shapefile that
> has a concrete type for the geometry? Should the transaction be rejected?
> I would say probably not.
>
> The strategy I usually take when dealing with this sort of thing is a
> "pull" approach. For every attribute in the destination (the table being
> updated) type look for an attribute in the source type (the file being
> uploaded). If attributes don't exist in the source type ignore it,
> and similarly any extra attributes in the source that don't exist in the
> destination should also be ignored.
>
>>
>> Also, from reviewing this thread, it's not clear whether we were talking
>> about removing the ability to overwrite/append altogether, or just avoiding
>> munging when the requested type is not available.  Just to double check, we
>> do want to keep the "update=" parameter and append or overwrite when the
>> desired resource is already present in the target store, right?
>>
>
> No I think we are just talking about avoiding the strange cases that occur
> when there are potential for name clashing by doing some pre checks and not
> allowing the user to create resources when there are name clashes. Not
> removing functionality like updating or appending to an existing table.
>
>
>
>> --
>> David Winslow
>> OpenGeo - http://opengeo.org/
>>
>> On Tue, Apr 24, 2012 at 1:02 PM, Gabriel Roldan <[email protected]>wrote:
>>
>>> On Mon, Apr 23, 2012 at 7:23 PM, Justin Deoliveira <[email protected]>
>>> wrote:
>>> > Hmmm... some subtle issues indeed. I agree that the most sane thing
>>> would be
>>> > to just send back an error when a name conflict occurs, giving the
>>> client
>>> > the ability to specify a different name.
>>> +1. Simpler, cleaner.
>>>
>>> >
>>> > On Mon, Apr 23, 2012 at 11:21 AM, David Winslow <[email protected]>
>>> > wrote:
>>> >>
>>> >> Hi all,
>>> >>
>>> >> I'm investigating an issue I came across with respect to importing
>>> data
>>> >> into existing datastores when data (shapefiles etc.) is uploaded
>>> through the
>>> >> REST API.  Currently the behavior is a little complicated to explain:
>>> >>
>>> >> 1) If no name conflict is detected, then the data is imported into a
>>> new
>>> >> physical resource (say, database table) with a name derived from the
>>> name of
>>> >> the uploaded file (so foo.shp => CREATE TABLE foo)
>>> >> 2) If a physical resource of the same name is present in the target
>>> >> datastore, then the data is put into that resource (either replacing
>>> or
>>> >> appending to the existing data, depending on request parameters.)
>>>  Actually
>>> >> the name conflict check is done *after* this step, so the resource is
>>> always
>>> >> modified.
>>> >> 3a) If a featuretype of the same name already exists in the same
>>> >> datastore, then the existing featuretype is used
>>> >> 3b) If a featuretype of the same name exists in a different datastore
>>> in
>>> >> the same workspace, a numeric suffix is appended to the native name to
>>> >> derive a name for the GeoServer ResourceInfo that gets created.  If
>>> this
>>> >> suffix would need to be greater than 9, then GeoServer just gives up
>>> and
>>> >> uses the _9 suffix, throwing an error when it tries to save.
>>> >> 3c) If a coverage of the same name exists in the same workspace, then
>>> >> GeoServer doesn't detect the conflict and errors when trying to save
>>> the
>>> >> ResourceInfo again.
>>> >>
>>> >> http://jira.codehaus.org/browse/GEOS-5057
>>> >>
>>> >> I think the new name conflict adjusting code I talked about last
>>> week[1]
>>> >> can help with issue 3(a-c), but I think maybe some adjustment to the
>>> data
>>> >> import behavior is in order as well.  I think a simple, less confusing
>>> >> behavior would be to never import data when there is a name conflict,
>>> and
>>> >> simply error out in this case.
>>> >>
>>> >> A more complicated option would be to rearrange things so that the
>>> name
>>> >> resolution happens before the data import, so that the name always
>>> matches
>>> >> up with the created table.  Why is this more complicated? It raises
>>> the
>>> >> issue of what to do when a table exists that appears to have had its
>>> name
>>> >> resolved previously: Say I have topp:states (a shapefile) and
>>> topp:states_1
>>> >> (a postgis table) and I try to import a shapefile into the postgis
>>> store
>>> >> through the REST API.  Should the shapefile be appended to
>>> topp:states_1 or
>>> >> added as a new featuretype in topp:states_2?
>>> >>
>>> >> [1]: http://comments.gmane.org/gmane.comp.gis.geoserver.devel/16512
>>> >>
>>> >> --
>>> >> David Winslow
>>> >> OpenGeo - http://opengeo.org/
>>> >>
>>> >>
>>> >>
>>> ------------------------------------------------------------------------------
>>> >> For Developers, A Lot Can Happen In A Second.
>>> >> Boundary is the first to Know...and Tell You.
>>> >> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
>>> >> http://p.sf.net/sfu/Boundary-d2dvs2
>>> >>
>>> >> _______________________________________________
>>> >> Geoserver-devel mailing list
>>> >> [email protected]
>>> >> https://lists.sourceforge.net/lists/listinfo/geoserver-devel
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Justin Deoliveira
>>> > OpenGeo - http://opengeo.org
>>> > Enterprise support for open source geospatial.
>>> >
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > For Developers, A Lot Can Happen In A Second.
>>> > Boundary is the first to Know...and Tell You.
>>> > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
>>> > http://p.sf.net/sfu/Boundary-d2dvs2
>>> >
>>> > _______________________________________________
>>> > Geoserver-devel mailing list
>>> > [email protected]
>>> > https://lists.sourceforge.net/lists/listinfo/geoserver-devel
>>> >
>>>
>>>
>>>
>>> --
>>> Gabriel Roldan
>>> OpenGeo - http://opengeo.org
>>> Expert service straight from the developers.
>>>
>>
>>
>
>
> --
> Justin Deoliveira
> OpenGeo - http://opengeo.org
> Enterprise support for open source geospatial.
>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Geoserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Reply via email to