Hey Steve,
> Chris, do you have a good reference for ODL files? The NASA Planetary Data System (PDS) Standards Reference and Chapter 12 on ODL is the best one I know: http://pds.nasa.gov/tools/standards-reference.shtml > It sounds like MinODL > parser will allow you to traverse from Group to Group Data Fields to > dimensions and the variables in an HDF-EOS file +1, yep. > > and to dimensions and variables in netCDF land, true? +1, yep. That's the goal! Cheers, Chris > > On Fri, May 27, 2011 at 12:31 PM, Mattmann, Chris A (388J) < > [email protected]> wrote: > >> Hey Steve! >> >> Nice to see you show up on the list :-) Yep, I totally agree, I have a >> couple of useful additions I'm going to create issues for and contribute >> back to Tika: >> >> 1. MinODL parser for ODL files themselves and also used in 2 below; >> 2. ParseContext properties identifying: >> - groups that are in fact ODL values, that need to be parsed with the >> MinODL parser (useful for NetCDF and for HDF) >> - what groups to select out (e.g., in HDF, by Path >> /Group1/SubGroup1/Property, and in NetCDF just by name) >> >> I think the combination of those will help the HDF and NetCDF parsers to >> become more robust, and configurable. Also, GDAL is high on my priority >> list. I've already built the Java bindings, but am working through some >> trickery with GDAL since it doesn't like the fact that Tika isn't file >> based, and when we use TikaInputStream, it creates a file of arbitrary >> extension (which ticks off GDAL as it's looking for something specific). I >> have a work-around though in the works... >> >> Cheers, >> Chris >> >> >> On May 26, 2011, at 4:20 AM, Steve Aulenbach wrote: >> >>> Hi Chris, >>> >>> I think your plan to improve the netCDF and HDF parsing is a great one. >> The >>> richness of a full ncdump of netCDF metadata and a full ncdump HDF-EOS >>> metadata would be an excellent addition to the 1.0 release of Tika. I >> have >>> discussed Tika to several science data user and they usually ask about >>> netCDF and HDF-EOS metadata capabilities. A GDAL parser is also a great >>> idea. >>> >>> Thanks, >>> Steve >>> >>> On Fri, May 20, 2011 at 12:22 PM, Mattmann, Chris A (388J) < >>> [email protected]> wrote: >>> >>>> Hey Jukka et al., >>>> >>>>> It's a few months since 0.9 and our Tika in Action book is soon ready >>>>> for print, so I think it's good time to start planning for the 1.0 >>>>> release. >>>> >>>> Looking forward to not writing anything for a while :-) I doubt it'll >>>> happen knowing how things go, but also really really happy with where >> the >>>> book is (and banging on those last revisions! :-) ). >>>> >>>>> >>>>> There are a few odds and ends that I'd still like to sort out in the >>>>> trunk, but overall I think we're in a pretty much ready for the switch >>>>> from 0.x to 1.x. >>>> >>>> +1. >>>> >>>>> >>>>> One major issue to be decided is whether we want to follow up with the >>>>> earlier intention of dropping deprecated functionality (like the >>>>> three-argument parse() method) before the 1.0 release. >>>> >>>> +1, I'd be fine with this. I'm a fan of following through on things that >> we >>>> say we're going to do if for no other good reason than we said we're >> going >>>> to do it. >>>> >>>> +1 to dropping the 3 arg parse method. >>>> >>>>> I think we >>>>> should do that and also make some other backwards-incompatible >>>>> cleanups while we're at it. That way we'll have less old baggage to >>>>> carry as we evolve through the 1.x release cycle. >>>> >>>> +1, my biggest thing to work on is improving the NetCDF and HDF parsing, >>>> adding an ODL parser (I'll create an issue for this), adding some >> spatial >>>> parsers (working on the GDAL one right now), and maybe some >> documentation on >>>> how to use the science data file formats. I should have time over the >> next >>>> month or so to complete these. >>>> >>>>> >>>>> Another thing to think about is whether we want to do a formal Apache >>>>> press release about Tika reaching 1.0 status. >>>> >>>> +1. I'd be happy to work with Jukka, as Nick suggested, to draft this, >> and >>>> then from there to work with Sally to make it happen. >>>> >>>> Thanks! >>>> >>>> Cheers, >>>> Chris >>>> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Chris Mattmann, Ph.D. >>>> Senior Computer Scientist >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>> Office: 171-266B, Mailstop: 171-246 >>>> Email: [email protected] >>>> WWW: http://sunset.usc.edu/~mattmann/ >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Adjunct Assistant Professor, Computer Science Department >>>> University of Southern California, Los Angeles, CA 90089 USA >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
