Hey Steve,

> Chris, do you have a good reference for ODL files?

The NASA Planetary Data System (PDS) Standards Reference and Chapter 12 on ODL 
is the best one I know:

http://pds.nasa.gov/tools/standards-reference.shtml

> It sounds like MinODL
> parser will allow you to traverse from Group to Group Data Fields to
> dimensions and the variables in an HDF-EOS file

+1, yep.

> 
> and to dimensions and variables in netCDF land, true?

+1, yep.

That's the goal!

Cheers,
Chris

> 
> On Fri, May 27, 2011 at 12:31 PM, Mattmann, Chris A (388J) <
> [email protected]> wrote:
> 
>> Hey Steve!
>> 
>> Nice to see you show up on the list :-) Yep, I totally agree, I have a
>> couple of useful additions I'm going to create issues for and contribute
>> back to Tika:
>> 
>> 1. MinODL parser for ODL files themselves and also used in 2 below;
>> 2. ParseContext properties identifying:
>>  - groups that are in fact ODL values, that need to be parsed with the
>> MinODL parser (useful for NetCDF and for HDF)
>>  - what groups to select out (e.g., in HDF, by Path
>> /Group1/SubGroup1/Property, and in NetCDF just by name)
>> 
>> I think the combination of those will help the HDF and NetCDF parsers to
>> become more robust, and configurable. Also, GDAL is high on my priority
>> list. I've already built the Java bindings, but am working through some
>> trickery with GDAL since it doesn't like the fact that Tika isn't file
>> based, and when we use TikaInputStream, it creates a file of arbitrary
>> extension (which ticks off GDAL as it's looking for something specific). I
>> have a work-around though in the works...
>> 
>> Cheers,
>> Chris
>> 
>> 
>> On May 26, 2011, at 4:20 AM, Steve Aulenbach wrote:
>> 
>>> Hi Chris,
>>> 
>>> I think your plan to improve the netCDF and HDF parsing is a great one.
>> The
>>> richness of a full ncdump of netCDF metadata and a full ncdump HDF-EOS
>>> metadata would be an excellent addition to the 1.0 release of Tika. I
>> have
>>> discussed Tika to several science data user  and they usually ask about
>>> netCDF and HDF-EOS metadata capabilities. A GDAL parser is also a great
>>> idea.
>>> 
>>> Thanks,
>>> Steve
>>> 
>>> On Fri, May 20, 2011 at 12:22 PM, Mattmann, Chris A (388J) <
>>> [email protected]> wrote:
>>> 
>>>> Hey Jukka et al.,
>>>> 
>>>>> It's a few months since 0.9 and our Tika in Action book is soon ready
>>>>> for print, so I think it's good time to start planning for the 1.0
>>>>> release.
>>>> 
>>>> Looking forward to not writing anything for a while :-) I doubt it'll
>>>> happen knowing how things go, but also really really happy with where
>> the
>>>> book is (and banging on those last revisions! :-) ).
>>>> 
>>>>> 
>>>>> There are a few odds and ends that I'd still like to sort out in the
>>>>> trunk, but overall I think we're in a pretty much ready for the switch
>>>>> from 0.x to 1.x.
>>>> 
>>>> +1.
>>>> 
>>>>> 
>>>>> One major issue to be decided is whether we want to follow up with the
>>>>> earlier intention of dropping deprecated functionality (like the
>>>>> three-argument parse() method) before the 1.0 release.
>>>> 
>>>> +1, I'd be fine with this. I'm a fan of following through on things that
>> we
>>>> say we're going to do if for no other good reason than we said we're
>> going
>>>> to do it.
>>>> 
>>>> +1 to dropping the 3 arg parse method.
>>>> 
>>>>> I think we
>>>>> should do that and also make some other backwards-incompatible
>>>>> cleanups while we're at it. That way we'll have less old baggage to
>>>>> carry as we evolve through the 1.x release cycle.
>>>> 
>>>> +1, my biggest thing to work on is improving the NetCDF and HDF parsing,
>>>> adding an ODL parser (I'll create an issue for this), adding some
>> spatial
>>>> parsers (working on the GDAL one right now), and maybe some
>> documentation on
>>>> how to use the science data file formats. I should have time over the
>> next
>>>> month or so to complete these.
>>>> 
>>>>> 
>>>>> Another thing to think about is whether we want to do a formal Apache
>>>>> press release about Tika reaching 1.0 status.
>>>> 
>>>> +1. I'd be happy to work with Jukka, as Nick suggested, to draft this,
>> and
>>>> then from there to work with Sally to make it happen.
>>>> 
>>>> Thanks!
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: [email protected]
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 
>>>> 
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: [email protected]
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to