Re: shapefile branch

Travis L Pinney Thu, 20 Jun 2013 07:33:05 -0700

Good to know about the OGC/ISO interfaces.

It would make sense to apply processing to NetCDF, Shapefile, Mbtiles files
etc. I can set up in another code repo on github. The reason I want to work
on that concurrently is to stress test the existing library with lots of
data to find bugs that may not appear with simple unit tests.




Thanks,
Travis





On Thu, Jun 20, 2013 at 7:42 AM, Martin Desruisseaux <
[email protected]> wrote:

> Le 20/06/13 12:47, Travis L Pinney a écrit :
>
>  The java.util.Map is fairly basic now. An improvement could be a feature
>> class that has a map of <String, DataType>, where DataType corresponds to
>> the appropriate DataType (
>> http://www.clicketyclick.dk/**databases/xbase/format/data_**types.html<http://www.clicketyclick.dk/databases/xbase/format/data_types.html>
>> .)
>> Currently I am converting everything to strings.
>>
>
> Actually Feature, FeatureType and related interfaces derived from OGC/ISO
> standards (in particular GML - Geographic Markup Language - schemas) are
> already provided in GeoAPI:
>
> http://www.geoapi.org/**snapshot/pending/org/opengis/**
> feature/package-summary.html<http://www.geoapi.org/snapshot/pending/org/opengis/feature/package-summary.html>
>
> This is in the "pending" part of GeoAPI, so we have room for revising
> them, in particular make sure that they are still in agreement with latest
> OGC/ISO standards. Then we would need to provide an implementation in SIS,
> porting Geotk classes when possible or appropriate. However there is a
> somewhat long road before we reach that point, so it seems to me that your
> current approach (String in java.util.Map) is good in the main time.
>
>
>
>  The bulk ingests would be an api where you can call a jar file from
>> hadoop,
>> give it appropriate directory to pull shapefiles in HDFS, and it would
>> process each shapefile per mapper. The first ingest I am working on is a
>> transformation of points to a 2D-histogram to get an idea of density of
>> features of all the shapefiles. This could be extended to have different
>> types of outputs (store in a database or more efficient format on hdfs)
>>
>
> I would suggest to separate the two tasks. I think that the above is what
> we call a "processing", which is the subject of (yet an other) OGC
> standard. Processing and DataStore should be independent, i.e. someone may
> want to apply the above processing on NetCDF files too... Maybe we can
> focus on ShapefileStore first, and revisit processing later? Processings
> will need DataStores first in order to perform their work anyway...
>
>     Martin
>
>

Re: shapefile branch

Reply via email to