Hi Chris, I will check that out.
Thanks! Travis On Sun, Aug 25, 2013 at 11:13 PM, Mattmann, Chris A (398J) < [email protected]> wrote: > Guys, I did GDAL bindings for Tika in TIKA-605 by building the Java > JAR bindings -- I think it's a good route (but the problem is that the > Jar isn't in Maven Central). > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > -----Original Message----- > From: Travis L Pinney <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Sunday, August 25, 2013 6:47 PM > To: dev <[email protected]> > Subject: Re: Proposal for merging Shapefile branch to trunk > > >Hi Adam and Martin, > > > >Would it be ok to leave it as is because there are a small number of data > >storage modules currently? I think of storage as something that holds > >common formats that run across all the different storage formats, like a > >Feature. Eventually it will get to the point where you will not want to > >have a multitude of jar files. I see the sis-shapefile as a fairly > >distinct > >file driver because of the complex format of a shapefile (not necessarily > >good complexity). > > > >Adding GDAL bindings for commons formats would be very useful. This would > >make it easier to do large bulk processing of geospatial data with Hadoop > >like the presentation in the following video: > > > >https://www.youtube.com/watch?v=_JCPf89s-NI > > > > > >Thanks, > >Travis > > > > > > > > > > > > > > > > > > > >On Sun, Aug 25, 2013 at 8:06 PM, Adam Estrada > ><[email protected]>wrote: > > > >> Hey Martin, > >> > >> Regarding where to put all the file format modules, I am just concerned > >> that it might be difficult to keep things straight if there is a > >>mixture of > >> "complex" formats and everything else. I think we all trust your > >>opinion on > >> where to put things but we really just need to keep the end user and > >>other > >> potential committers in mind when moving forward in the development > >> process. For example, I take a look at the directory structure in SVN > >>[1] > >> and I automatically think that each format should be in its own module > >>like > >> sis-netcdf because of the way it's organized. > >> > >> Just my 2 cents at this point and feedback from other folks is certainly > >> welcome :) > >> > >> Adam > >> > >> [1] https://svn.apache.org/repos/asf/sis/trunk/storage/ > >> > >> > >> On Sun, Aug 25, 2013 at 4:39 PM, Martin Desruisseaux < > >> [email protected]> wrote: > >> > >> > Hello Adam > >> > > >> > Le 25/08/13 21:34, Adam Estrada a écrit : > >> > > >> > It is true that the Shapefile is very widely used but it has lots and > >> lots > >> >> of limitations. The main one that I can think of is that it can't > >>handle > >> >> UTF-encoded characters in the attribute table. Can I suggest maybe > >> working > >> >> towards something like an "interchange" module where all the file > >> formats > >> >> live? > >> >> > >> > > >> > I agree with all the above, and in the current SIS state the > >> "interchange" > >> > module is actually the "storage" group of modules. This group of > >>modules > >> > currently contains: > >> > > >> > * sis-storage: provides the basis common to all formats. > >> > * sis-netcdf: for the NetCDF format. > >> > > >> > > >> > My concern is about whether we should put the Shapefile code in its > >>own > >> > "sis-shapefile" module (which would depend on "sis-storage"), or put > >>it > >> > straight in "sis-storage". > >> > > >> > One extreme view is to adopt a "one format == one module" policy. But > >>in > >> > Geotoolkit.org, this policy resulted in more than 120 modules, some of > >> them > >> > with very few classes. In security constrained environment, where > >>every > >> JAR > >> > files requires its own SecurityManager policies, this is very tedious. > >> > > >> > Consequently, I would like to group some formats in the same JAR > >>files in > >> > order to keep the amount of modules to a reasonable number. Then, the > >> > question would be which granularity to choose. My proposal is to not > >>put > >> > every format in its own module, but put a format in its own module if > >>it > >> > meets some of the following conditions: > >> > > >> > * The format is not widely used. > >> > * The format is complex, so it requires a large number of classes or > >> > resources. > >> > * The format depends on an external library or on native code. > >> > > >> > > >> > The NetCDF format is proposed in its own module because it is complex > >> (the > >> > classes currently in "sis-netcdf" are just scratching the surface) and > >> may > >> > have a dependency to a large library (while I would like to keep that > >> > dependency optional). Shapefile on the contrary is relatively simple > >>and > >> > needs no external dependency. > >> > > >> > Given that "sis-storage" would be the basis of all formats in SIS, my > >> > proposal is to put also in "sis-storage" some formats considered as > >> > "fundamental ones", I mean some formats so widely spread that any > >>users > >> are > >> > very likely to meet them. They would not be the only or "main" SIS > >> formats > >> > - they would rather be the "minimal requirements". Other modules like > >> > "sis-netcdf" would provide more elaborated formats. > >> > > >> > > >> > > >> > For vector data, there are quite a few of them out there. OGR > >> >> references many of them [1] but that opens the debate on whether or > >>not > >> to > >> >> just use GDAL. I suppose we could just have GDAL support as a module > >> which > >> >> would require some sort of JNI bindings to work in a pure Java > >>library > >> >> like > >> >> SIS. What are your thoughts on this? > >> >> > >> > > >> > Yes, this is also the plan :-). We already used GDAL through JNI on > >>our > >> > side, and that code is also part of the proposed migration to SIS. The > >> > approach that I would recommend is to use pure Java code for many > >>formats > >> > (Shapefile, ASCII grid, GeoTIFF, NetCDF, PNG), and fallback on GDAL > >>as a > >> > complement for other formats. > >> > > >> > A similar argument apply to Coordinate Transformation Services. We > >>have > >> > pure Java code (their port to SIS started last week, beginning with > >>WKT), > >> > but we plan to support Proj.4 through JNI even for map projections > >> > available in pure Java, because in some situations a user may need the > >> > guarantees to get the exact same results than PostGIS or MapServer for > >> > instance (those products are built on top of Proj.4). > >> > > >> > Martin > >> > > >> > > >> > >
