Hi, Chris,
I would agree that we probably should come up with a more comprehensive 
solution for this wrt the metadata object and the resulting XHTML.  That would 
make this feel a little more like the geospatial stuff is more of a first class 
citizen in the metadata hierarchy.

We will probably need to support more coordinate systems than just WGS 84, as 
there are a number of systems that either have no transformation to WGS 84.  
The encoding of the WKT is also pretty important.  Would you rather break it 
down to it's component parts, probably datum and projection for starters, or 
leave it whole?  Obviously, the more metadata we have, the more powerful Tika 
becomes, but there is a point where you have too much data that is not as 
useful.

On another note, I took a look at the code for your 605 patch, and I have a 
suggestion. Reading the notes on the checkins for the patch, I noticed that no 
one had suggested using the in-memory Dataset as the default type.  There is no 
reason why the stream used to open the Tika parser could not be used to fill a 
buffer with the file data, and then use that to create a dataset.

As it is, I'm trying to get GDAL to cooperate with me on my Mac.  Being a 
newcomer to Mac seems to be a drawback when trying to be productive.  It just 
takes a little more fight to get the bits to do what I really want.

In any case, once I get GDAL whipped into shape, I'll see if I can't get a test 
file to recognize any geospatial data, and then we will be off and running.

Thanks

Joe 
On Feb 26, 2012, at 1:10 PM, Mattmann, Chris A (388J) wrote:

> Hi Joe,
> 
> Awesome! Thanks for picking this up and getting interested in this work. 
> Right now, the only use cases we've had so far
> is to represent lats and lons (WGS84). It would be great to extract more 
> information and come up with a policy for representing
> more WKTs and so forth. We should probably start by coming up with a scheme 
> for encoding the extracted information in the 
> Tika metadata object and in its output XHTML. Do you have any ideas about how 
> to do that? Right now in the existing patch
> on TIKA-605, I simply was intended to use the met object and its 
> key-multi-value structure to represent the extracted information
> but to take advantage of streaming and of content handlers, we ought to 
> encode this information in the output XHTML.
> 
> Thoughts?
> 
> Cheers,
> Chris
> 
> On Feb 26, 2012, at 9:39 AM, Joe White wrote:
> 
>> Hi,
>> I'm looking into implementing a bridge/link between Tika and GDAL so that 
>> geospatial information can be saved from georeferenced images and vector 
>> types.  One thing that I have noticed while going through the code is that 
>> the code only defines geographic coordinate types, using latitudes and 
>> longitudes.  Is this by design?  If GDAL is wrapped into Tika, and a 
>> projected image is imported, are the geospatial extents meant to be held in 
>> the metadata as geographic points, possibly as WGS 84?  
>> 
>> Thanks
>> 
>> Joe White
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [email protected]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 

Reply via email to