Creating a schema is actually not that big a problem: assuming that all
datastores create tables (or whatever) that can *accept* features
conforming to the passed-in schema, the REST API importer can handle the
importing without worrying too much about the munging that's going on
"behind the scenes."  That's how things work today, and it seems to
generally work fine.

The complication arises when the schema is already present and we are
attempting to insert some new data into it -- the destination schema might
not be suitable for the uploaded data.  So modifying createSchema as you
suggest wouldn't actually solve the problem that I'm encountering (although
in general I think getting some feedback on the fact that GeoTools had to
modify your schema to realize it in a particular store is good to have.)

For the problem I originally encountered, I suppose it would just be a
boolean method taking two schemas:

public static boolean canInsertFrom(SimpleFeatureType source,
> SimpleFeatureType sink);


Implementing the logic as described in Justin's mail (or similar.)  Of
course, this API wouldn't address the concerns you raise about the Geometry
column having a different name, so maybe instead it should return some kind
of FeatureAdjuster object that can make that sort of change.

interface FeatureAdjuster {
>     public void adjust(SimpleFeature f);
> }
> public static FeatureAdjuster deriveAdjuster(SimpleFeatureType source,
> SimpleFeatureType sink) throws NoSafeAdjustmentException;


Either way, we could let it "incubate" in the GeoServer REST API for a
generation or two instead of having it go straight to GeoTools, making it
less of a blocker for the release.

--
David Winslow
OpenGeo - http://opengeo.org/

On Tue, May 1, 2012 at 8:49 AM, Andrea Aime <[email protected]>wrote:

> On Tue, May 1, 2012 at 2:36 AM, Justin Deoliveira <[email protected]>wrote:
>
>>
>>
>> On Mon, Apr 30, 2012 at 10:15 PM, David Winslow <[email protected]>wrote:
>>
>>> Looking deeper into identifying conflicts ahead of time, it seems that
>>> the "update=append" option makes things a bit more complicated - we really
>>> should avoid appending when the schemas for the source and target store
>>> don't match up (typename differences should be ok though.)  I tried to
>>> implement a schema equality check ignoring the name by simply overwriting
>>> the name on one of the featuretypes:
>>>
>>>                             SimpleFeatureType sourceSchema =
>>>> sourceDataStore.getSchema(featureTypeName);
>>>>                             SimpleFeatureType targetSchema =
>>>> (SimpleFeatureType)((FeatureTypeInfo)resource).getFeatureType();
>>>>                             SimpleFeatureTypeBuilder ftBuilder = new
>>>> SimpleFeatureTypeBuilder();
>>>>                             ftBuilder.init(sourceSchema);
>>>>                             ftBuilder.setName(targetSchema.getName());
>>>>                             sourceSchema = ftBuilder.buildFeatureType();
>>>>                             sameSchema =
>>>> sourceSchema.equals(targetSchema);
>>>
>>>
>>> However, I'm still not getting the expected value (true) from this for a
>>> Shapefile that I'm attempting to upload multiple times.
>>>
>>> Am I barking up the wrong tree? Is there a GeoTools method I should be
>>> using instead of trying to roll my own?
>>>
>>
>> Well... i am not sure I 100% agree that the source and target have to
>> match up exactly, especially given that specific format differences might
>> lead to situations where this is unwanted. For example, consider oracle.
>> Unless you have a spatial index on a column I believe oracle will simply
>> return "GEOMETRY" as the type. But say you are uploading a shapefile that
>> has a concrete type for the geometry? Should the transaction be rejected?
>> I would say probably not.
>>
>
> Indeed it's in general not possible to get an exact match between the
> original feature type and the
> native feature type:
> - attributes in Oracle all always uppercase
> - Oracle does not have the concept of "boolean"
> - shapefiles have only a single geometry column, it's always the first
> attribute, it's always called "the_geom"
> - shapefile dbf attributes have severe lenght limitations
> - and so on
>
> Generally speaking we'd need the createSchema to return some form of map
> from the original
> attribute names to the ones actually created (they could be properties in
> the AttributeDescriptor
> user map).
>
> In the OGR data store it gets even worse, you cannot call createSchema and
> expect the output to be
> created at all, you actually have to do both the schema creation and data
> appending in a single shot,
> or you won't get any output created by OGR.
>
> The latter made me roll the following extra method in the OGRDataStore:
>
>     public void createSchema(SimpleFeatureCollection data, boolean
> approximateFields,
>             String[] options) throws IOException {
>
> Now, the above will dump data as possible into the target storage, doing
> all the attribute
> mapping internally, which I believe it's even better than creating the
> mapping I described
> above, and let the store do whatever is best.
> It opens the road to recognize, at the db level, that one can use some
> bulk loading method
> to add data, to create the indexes _after_ the table is loaded, and
> generally speaking
> be free to do whatever type and name mapping is deemed necessary given the
> target
> storage tech constraints.
>
> The generic DataAccess api addition could look as follows:
>
>     public FeatureType createSchema(FeatureCollection data, Hints hints)
> throws IOException;
>
> where hints allows to pass down some data store specific extras, e.g.,
> databases
> could use it to create some extra indexes on the attributes, WFS data
> store could
> get the admin credentials to create the schema via RESTConfig on a remote
> GeoServer, and so on.
>
> This would be new API, for which we'd need a new trunk... stuff seems to
> start piling
> up for a new trunk to be available, time to cut a 2.3.x branch?
>
> Cheers
> Andrea
>
>
>
> --
> -------------------------------------------------------
> Ing. Andrea Aime
> GeoSolutions S.A.S.
> Tech lead
>
> Via Poggio alle Viti 1187
> 55054  Massarosa (LU)
> Italy
>
> phone: +39 0584 962313
> fax:      +39 0584 962313
> mob:    +39 339 8844549
>
> http://www.geo-solutions.it
> http://geo-solutions.blogspot.com/
> http://www.youtube.com/user/GeoSolutionsIT
> http://www.linkedin.com/in/andreaaime
> http://twitter.com/geowolf
>
> -------------------------------------------------------
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Geoserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Reply via email to