Guys, I did GDAL bindings for Tika in TIKA-605 by building the Java
JAR bindings -- I think it's a good route (but the problem is that the
Jar isn't in Maven Central).

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Travis L Pinney <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Sunday, August 25, 2013 6:47 PM
To: dev <[email protected]>
Subject: Re: Proposal for merging Shapefile branch to trunk

>Hi Adam and Martin,
>
>Would it be ok to leave it as is because there are a small number of data
>storage modules currently? I think of storage as something that holds
>common formats that run across all the different storage formats, like a
>Feature. Eventually it will get to the point where you will not want to
>have a multitude of jar files. I see the sis-shapefile as a fairly
>distinct
>file driver because of the complex format of a shapefile (not necessarily
>good complexity).
>
>Adding GDAL bindings for commons formats would be very useful. This would
>make it easier to do large bulk processing of geospatial data with Hadoop
>like the presentation in the following video:
>
>https://www.youtube.com/watch?v=_JCPf89s-NI
>
>
>Thanks,
>Travis
>
>
>
>
>
>
>
>
>
>On Sun, Aug 25, 2013 at 8:06 PM, Adam Estrada
><[email protected]>wrote:
>
>> Hey Martin,
>>
>> Regarding where to put all the file format modules, I am just concerned
>> that it might be difficult to keep things straight if there is a
>>mixture of
>> "complex" formats and everything else. I think we all trust your
>>opinion on
>> where to put things but we really just need to keep the end user and
>>other
>> potential committers in mind when moving forward in the development
>> process. For example, I take a look at the directory structure in SVN
>>[1]
>> and I automatically think that each format should be in its own module
>>like
>> sis-netcdf because of the way it's organized.
>>
>> Just my 2 cents at this point and feedback from other folks is certainly
>> welcome :)
>>
>> Adam
>>
>> [1] https://svn.apache.org/repos/asf/sis/trunk/storage/
>>
>>
>> On Sun, Aug 25, 2013 at 4:39 PM, Martin Desruisseaux <
>> [email protected]> wrote:
>>
>> > Hello Adam
>> >
>> > Le 25/08/13 21:34, Adam Estrada a écrit :
>> >
>> >  It is true that the Shapefile is very widely used but it has lots and
>> lots
>> >> of limitations. The main one that I can think of is that it can't
>>handle
>> >> UTF-encoded characters in the attribute table. Can I suggest maybe
>> working
>> >> towards something like an "interchange" module where all the file
>> formats
>> >> live?
>> >>
>> >
>> > I agree with all the above, and in the current SIS state the
>> "interchange"
>> > module is actually the "storage" group of modules. This group of
>>modules
>> > currently contains:
>> >
>> >  * sis-storage: provides the basis common to all formats.
>> >  * sis-netcdf: for the NetCDF format.
>> >
>> >
>> > My concern is about whether we should put the Shapefile code in its
>>own
>> > "sis-shapefile" module (which would depend on "sis-storage"), or put
>>it
>> > straight in "sis-storage".
>> >
>> > One extreme view is to adopt a "one format == one module" policy. But
>>in
>> > Geotoolkit.org, this policy resulted in more than 120 modules, some of
>> them
>> > with very few classes. In security constrained environment, where
>>every
>> JAR
>> > files requires its own SecurityManager policies, this is very tedious.
>> >
>> > Consequently, I would like to group some formats in the same JAR
>>files in
>> > order to keep the amount of modules to a reasonable number. Then, the
>> > question would be which granularity to choose. My proposal is to not
>>put
>> > every format in its own module, but put a format in its own module if
>>it
>> > meets some of the following conditions:
>> >
>> >  * The format is not widely used.
>> >  * The format is complex, so it requires a large number of classes or
>> >    resources.
>> >  * The format depends on an external library or on native code.
>> >
>> >
>> > The NetCDF format is proposed in its own module because it is complex
>> (the
>> > classes currently in "sis-netcdf" are just scratching the surface) and
>> may
>> > have a dependency to a large library (while I would like to keep that
>> > dependency optional). Shapefile on the contrary is relatively simple
>>and
>> > needs no external dependency.
>> >
>> > Given that "sis-storage" would be the basis of all formats in SIS, my
>> > proposal is to put also in "sis-storage" some formats considered as
>> > "fundamental ones", I mean some formats so widely spread that any
>>users
>> are
>> > very likely to meet them. They would not be the only or "main" SIS
>> formats
>> > - they would rather be the "minimal requirements". Other modules like
>> > "sis-netcdf" would provide more elaborated formats.
>> >
>> >
>> >
>> >  For vector data, there are quite a few of them out there. OGR
>> >> references many of them [1] but that opens the debate on whether or
>>not
>> to
>> >> just use GDAL. I suppose we could just have GDAL support as a module
>> which
>> >> would require some sort of JNI bindings to work in a pure Java
>>library
>> >> like
>> >> SIS. What are your thoughts on this?
>> >>
>> >
>> > Yes, this is also the plan :-). We already used GDAL through JNI on
>>our
>> > side, and that code is also part of the proposed migration to SIS. The
>> > approach that I would recommend is to use pure Java code for many
>>formats
>> > (Shapefile, ASCII grid, GeoTIFF, NetCDF, PNG), and fallback on GDAL
>>as a
>> > complement for other formats.
>> >
>> > A similar argument apply to Coordinate Transformation Services. We
>>have
>> > pure Java code (their port to SIS started last week, beginning with
>>WKT),
>> > but we plan to support Proj.4 through JNI even for map projections
>> > available in pure Java, because in some situations a user may need the
>> > guarantees to get the exact same results than PostGIS or MapServer for
>> > instance (those products are built on top of Proj.4).
>> >
>> >     Martin
>> >
>> >
>>

Reply via email to