Guys, I did GDAL bindings for Tika in TIKA-605 by building the Java JAR bindings -- I think it's a good route (but the problem is that the Jar isn't in Maven Central).
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Travis L Pinney <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Sunday, August 25, 2013 6:47 PM To: dev <[email protected]> Subject: Re: Proposal for merging Shapefile branch to trunk >Hi Adam and Martin, > >Would it be ok to leave it as is because there are a small number of data >storage modules currently? I think of storage as something that holds >common formats that run across all the different storage formats, like a >Feature. Eventually it will get to the point where you will not want to >have a multitude of jar files. I see the sis-shapefile as a fairly >distinct >file driver because of the complex format of a shapefile (not necessarily >good complexity). > >Adding GDAL bindings for commons formats would be very useful. This would >make it easier to do large bulk processing of geospatial data with Hadoop >like the presentation in the following video: > >https://www.youtube.com/watch?v=_JCPf89s-NI > > >Thanks, >Travis > > > > > > > > > >On Sun, Aug 25, 2013 at 8:06 PM, Adam Estrada ><[email protected]>wrote: > >> Hey Martin, >> >> Regarding where to put all the file format modules, I am just concerned >> that it might be difficult to keep things straight if there is a >>mixture of >> "complex" formats and everything else. I think we all trust your >>opinion on >> where to put things but we really just need to keep the end user and >>other >> potential committers in mind when moving forward in the development >> process. For example, I take a look at the directory structure in SVN >>[1] >> and I automatically think that each format should be in its own module >>like >> sis-netcdf because of the way it's organized. >> >> Just my 2 cents at this point and feedback from other folks is certainly >> welcome :) >> >> Adam >> >> [1] https://svn.apache.org/repos/asf/sis/trunk/storage/ >> >> >> On Sun, Aug 25, 2013 at 4:39 PM, Martin Desruisseaux < >> [email protected]> wrote: >> >> > Hello Adam >> > >> > Le 25/08/13 21:34, Adam Estrada a écrit : >> > >> > It is true that the Shapefile is very widely used but it has lots and >> lots >> >> of limitations. The main one that I can think of is that it can't >>handle >> >> UTF-encoded characters in the attribute table. Can I suggest maybe >> working >> >> towards something like an "interchange" module where all the file >> formats >> >> live? >> >> >> > >> > I agree with all the above, and in the current SIS state the >> "interchange" >> > module is actually the "storage" group of modules. This group of >>modules >> > currently contains: >> > >> > * sis-storage: provides the basis common to all formats. >> > * sis-netcdf: for the NetCDF format. >> > >> > >> > My concern is about whether we should put the Shapefile code in its >>own >> > "sis-shapefile" module (which would depend on "sis-storage"), or put >>it >> > straight in "sis-storage". >> > >> > One extreme view is to adopt a "one format == one module" policy. But >>in >> > Geotoolkit.org, this policy resulted in more than 120 modules, some of >> them >> > with very few classes. In security constrained environment, where >>every >> JAR >> > files requires its own SecurityManager policies, this is very tedious. >> > >> > Consequently, I would like to group some formats in the same JAR >>files in >> > order to keep the amount of modules to a reasonable number. Then, the >> > question would be which granularity to choose. My proposal is to not >>put >> > every format in its own module, but put a format in its own module if >>it >> > meets some of the following conditions: >> > >> > * The format is not widely used. >> > * The format is complex, so it requires a large number of classes or >> > resources. >> > * The format depends on an external library or on native code. >> > >> > >> > The NetCDF format is proposed in its own module because it is complex >> (the >> > classes currently in "sis-netcdf" are just scratching the surface) and >> may >> > have a dependency to a large library (while I would like to keep that >> > dependency optional). Shapefile on the contrary is relatively simple >>and >> > needs no external dependency. >> > >> > Given that "sis-storage" would be the basis of all formats in SIS, my >> > proposal is to put also in "sis-storage" some formats considered as >> > "fundamental ones", I mean some formats so widely spread that any >>users >> are >> > very likely to meet them. They would not be the only or "main" SIS >> formats >> > - they would rather be the "minimal requirements". Other modules like >> > "sis-netcdf" would provide more elaborated formats. >> > >> > >> > >> > For vector data, there are quite a few of them out there. OGR >> >> references many of them [1] but that opens the debate on whether or >>not >> to >> >> just use GDAL. I suppose we could just have GDAL support as a module >> which >> >> would require some sort of JNI bindings to work in a pure Java >>library >> >> like >> >> SIS. What are your thoughts on this? >> >> >> > >> > Yes, this is also the plan :-). We already used GDAL through JNI on >>our >> > side, and that code is also part of the proposed migration to SIS. The >> > approach that I would recommend is to use pure Java code for many >>formats >> > (Shapefile, ASCII grid, GeoTIFF, NetCDF, PNG), and fallback on GDAL >>as a >> > complement for other formats. >> > >> > A similar argument apply to Coordinate Transformation Services. We >>have >> > pure Java code (their port to SIS started last week, beginning with >>WKT), >> > but we plan to support Proj.4 through JNI even for map projections >> > available in pure Java, because in some situations a user may need the >> > guarantees to get the exact same results than PostGIS or MapServer for >> > instance (those products are built on top of Proj.4). >> > >> > Martin >> > >> > >>
