+1 on the functionalities.

The java.util.Map is fairly basic now. An improvement could be a feature
class that has a map of <String, DataType>, where DataType corresponds to
the appropriate DataType (
http://www.clicketyclick.dk/databases/xbase/format/data_types.html.)
Currently I am converting everything to strings.

Another improvement may be to give ordering to the fields because fields
have an intrinsic order.

Maybe use something like this ?
https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections/map/ListOrderedMap.html

The bulk ingests would be an api where you can call a jar file from hadoop,
give it appropriate directory to pull shapefiles in HDFS, and it would
process each shapefile per mapper. The first ingest I am working on is a
transformation of points to a 2D-histogram to get an idea of density of
features of all the shapefiles. This could be extended to have different
types of outputs (store in a database or more efficient format on hdfs)

Thanks,
Travis









On Thu, Jun 20, 2013 at 6:11 AM, Martin Desruisseaux <
[email protected]> wrote:

> Hello Travis
>
> Le 20/06/13 11:13, Travis L Pinney a écrit :
>
>  Could the sis-storage be a "module" as well as have the ability to be
>> compiled to a sis-shapefile.jar that has less dependencies for people that
>> only want to use shape file functionality? Maybe it can have two outputs
>> and generate a standalone artifact as well as be including in the larger
>> package.
>>
>
> I think that it depends what we call using only the Shapefile
> functionality. Some functionalities that would probably require other SIS
> modules are:
>
>  * Allow peoples to know what is inside the shapefiles without relying
>    on ShapefileStore-specific API (require sis-metadata module).
>  * Parse the map projection definition (will require sis-referencing
>    module, after completion).
>  * Leverage the index for faster access (may require sis-utility).
>
>
> Maybe more important, the current ShapefileStore exposes the features as a
> java.util.Map. I think that it is okay as a temporary solution since SIS
> does not yet implement the Feature interface. But once a real Feature
> framework is provided in SIS, we should probably leverage it in the
> ShapefileStore class if we want a consistent API for the whole project...
>
> Furthermore, in a future SIS version we will start to implement Filters
> (i.e. allow the ShapefileStore to read only the data having some
> characteristics, for example only the data in some area of interest). Some
> filtering can be applied on-the-fly at reading time by the ShapefileStore,
> especially the filtering that can leverage index. So ShapefileStore would
> depends on the filter classes (some filters may imply map projection, thus
> depending on sis-referencing, etc.).
>
> For all those reasons, it seems to me that a ShapefileStore without SIS
> dependency would have very limited functionality in medium/long term...
>
>
>
>  I want to write a shapefile input format for hadoop for doing bulk ingests
>> of shapefiles. Where would be the best place to add this functionality?
>>
>
> Could you give more details about what the bulk ingests would perform
> exactly?
>
>     Martin
>
>

Reply via email to