Hi Chris,
Chris, do you have a good reference for ODL files? It sounds like MinODL
parser will allow you to traverse from Group to Group Data Fields to
dimensions and the variables in an HDF-EOS file
netcdf
/Users/saulenbach/dev/tikaModisTest/MOD15A2.A2010209.h09v04.005.2010219082157.hdf
{
variables:
char StructMetadata.0(32000);
char CoreMetadata.0(16125);
char ArchiveMetadata.0(5336);
char ENGINEERING_DATA(8337);
Group MOD_Grid_MOD15A2 {
Group Data Fields {
dimensions:
XDim = 1200;
YDim = 1200;
variables:
double Fpar_1km(YDim=1200, XDim=1200);
:scale_factor_err = 0.0; // double
:add_offset_err = 0.0; // double
:calibrated_nt = 21; // int
:long_name = "MOD15A2 MODIS/Terra Gridded 1KM FPAR (8-day composite)";
:units = "Percent";
:MOD15A2_FILLVALUE_DOC = "MOD15A2 FILL VALUE LEGEND\n255 =
_Fillvalue, assigned when:\n * the MODAGAGG suf. reflectance for
channel VIS, NIR was assigned its _Fillvalue, or\n * land cover
pixel itself was assigned _Fillvalus 255 or 254.\n254 = land cover
assigned as perennial salt or inland fresh water.\n253 = land cover
assigned as barren, sparse vegetation (rock, tundra, desert.)\n252 =
land cover assigned as perennial snow, ice.\n251 = land cover assigned
as \"permanent\" wetlands/inundated marshlands.\n250 = land cover
assigned as urban/built-up.\n249 = land cover assigned as
\"unclassified\" or not able to determine.\n";
and to dimensions and variables in netCDF land, true?
netcdf
file:/Users/saulenbach/src/tika/tika-site/tika-parsers/target/test-classes/test-documents/sresa1b_ncar_ccsm3_0_run1_200001.nc
{
dimensions:
lat = 128;
lon = 256;
bnds = 2;
plev = 17;
time = UNLIMITED; // (1 currently)
variables:
float area(lat=128, lon=256);
:long_name = "Surface area";
:units = "meter2";
double lat_bnds(lat=128, bnds=2);
double lon_bnds(lon=256, bnds=2);
int msk_rgn(lat=128, lon=256);
:long_name = "Mask region";
:units = "bool";
float pr(time=1, lat=128, lon=256);
:comment = "Created using NCL code CCSM_atmm_2cf.ncl on\n machine
eagle163s";
:missing_value = 1.0E20f; // float
:_FillValue = 1.0E20f; // float
:cell_methods = "time: mean (interval: 1 month)";
:history = "(PRECC+PRECL)*r[h2o]";
:original_units = "m-1 s-1";
:original_name = "PRECC, PRECL";
:standard_name = "precipitation_flux";
:units = "kg m-2 s-1";
:long_name = "precipitation_flux";
:cell_method = "time: mean";
Thanks,
Steve
On Fri, May 27, 2011 at 12:31 PM, Mattmann, Chris A (388J) <
[email protected]> wrote:
> Hey Steve!
>
> Nice to see you show up on the list :-) Yep, I totally agree, I have a
> couple of useful additions I'm going to create issues for and contribute
> back to Tika:
>
> 1. MinODL parser for ODL files themselves and also used in 2 below;
> 2. ParseContext properties identifying:
> - groups that are in fact ODL values, that need to be parsed with the
> MinODL parser (useful for NetCDF and for HDF)
> - what groups to select out (e.g., in HDF, by Path
> /Group1/SubGroup1/Property, and in NetCDF just by name)
>
> I think the combination of those will help the HDF and NetCDF parsers to
> become more robust, and configurable. Also, GDAL is high on my priority
> list. I've already built the Java bindings, but am working through some
> trickery with GDAL since it doesn't like the fact that Tika isn't file
> based, and when we use TikaInputStream, it creates a file of arbitrary
> extension (which ticks off GDAL as it's looking for something specific). I
> have a work-around though in the works...
>
> Cheers,
> Chris
>
>
> On May 26, 2011, at 4:20 AM, Steve Aulenbach wrote:
>
> > Hi Chris,
> >
> > I think your plan to improve the netCDF and HDF parsing is a great one.
> The
> > richness of a full ncdump of netCDF metadata and a full ncdump HDF-EOS
> > metadata would be an excellent addition to the 1.0 release of Tika. I
> have
> > discussed Tika to several science data user and they usually ask about
> > netCDF and HDF-EOS metadata capabilities. A GDAL parser is also a great
> > idea.
> >
> > Thanks,
> > Steve
> >
> > On Fri, May 20, 2011 at 12:22 PM, Mattmann, Chris A (388J) <
> > [email protected]> wrote:
> >
> >> Hey Jukka et al.,
> >>
> >>> It's a few months since 0.9 and our Tika in Action book is soon ready
> >>> for print, so I think it's good time to start planning for the 1.0
> >>> release.
> >>
> >> Looking forward to not writing anything for a while :-) I doubt it'll
> >> happen knowing how things go, but also really really happy with where
> the
> >> book is (and banging on those last revisions! :-) ).
> >>
> >>>
> >>> There are a few odds and ends that I'd still like to sort out in the
> >>> trunk, but overall I think we're in a pretty much ready for the switch
> >>> from 0.x to 1.x.
> >>
> >> +1.
> >>
> >>>
> >>> One major issue to be decided is whether we want to follow up with the
> >>> earlier intention of dropping deprecated functionality (like the
> >>> three-argument parse() method) before the 1.0 release.
> >>
> >> +1, I'd be fine with this. I'm a fan of following through on things that
> we
> >> say we're going to do if for no other good reason than we said we're
> going
> >> to do it.
> >>
> >> +1 to dropping the 3 arg parse method.
> >>
> >>> I think we
> >>> should do that and also make some other backwards-incompatible
> >>> cleanups while we're at it. That way we'll have less old baggage to
> >>> carry as we evolve through the 1.x release cycle.
> >>
> >> +1, my biggest thing to work on is improving the NetCDF and HDF parsing,
> >> adding an ODL parser (I'll create an issue for this), adding some
> spatial
> >> parsers (working on the GDAL one right now), and maybe some
> documentation on
> >> how to use the science data file formats. I should have time over the
> next
> >> month or so to complete these.
> >>
> >>>
> >>> Another thing to think about is whether we want to do a formal Apache
> >>> press release about Tika reaching 1.0 status.
> >>
> >> +1. I'd be happy to work with Jukka, as Nick suggested, to draft this,
> and
> >> then from there to work with Sally to make it happen.
> >>
> >> Thanks!
> >>
> >> Cheers,
> >> Chris
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Senior Computer Scientist
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 171-266B, Mailstop: 171-246
> >> Email: [email protected]
> >> WWW: http://sunset.usc.edu/~mattmann/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Adjunct Assistant Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [email protected]
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>