[
https://issues.apache.org/jira/browse/TIKA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991234#comment-13991234
]
Chris A. Mattmann commented on TIKA-1265:
-----------------------------------------
Tested on example NetCDF, acceptance test passed:
{noformat}
[chipotle:tika/tika-app/target] mattmann% java -jar tika-app-1.6-SNAPSHOT.jar
/Users/mattmann/tmp/tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_200001.nc
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="Conventions" content="CF-1.0"/>
<meta name="acknowledgment" content=" Any use of CCSM data should acknowledge
the contribution of the CCSM project and CCSM sponsor agencies with the
following citation: 'This research uses data provided by the
Community Climate System Model project (www.ccsm.ucar.edu), supported by
the Directorate for Geosciences of the National Science Foundation
and the Office of Biological and Environmental Research of the U.S.
Department of Energy.' In addition, the words 'Community Climate System
Model' and 'CCSM' should be included as metadata for webpages
referencing work using CCSM data or as keywords provided to journal or
book publishers of your manuscripts. Users of CCSM data accept the
responsibility of emailing citations of publications of research using
CCSM data to [email protected]. Any redistribution of CCSM data must
include this data acknowledgement statement."/>
<meta name="Content-Length" content="2767916"/>
<meta name="experiment_id" content="720 ppm stabilization experiment
(SRESA1B)"/>
<meta name="table_id" content="Table A1"/>
<meta name="cmd_ln" content="bds -x 256 -y 128 -m 23 -o
/data/zender/data/dst_T85.nc"/>
<meta name="contact" content="[email protected]"/>
<meta name="creation_date" content=""/>
<meta name="history" content="Tue Oct 25 15:08:51 2005: ncks -O -x -v va -m
sresa1b_ncar_ccsm3_0_run1_200001.nc sresa1b_ncar_ccsm3_0_run1_200001.nc Tue
Oct 25 15:07:21 2005: ncks -d time,0 sresa1b_ncar_ccsm3_0_run1_200001_201912.nc
sresa1b_ncar_ccsm3_0_run1_200001.nc Tue Oct 25 13:29:43 2005: ncks -d
time,0,239 sresa1b_ncar_ccsm3_0_run1_200001_209912.nc
/var/www/html/tmp/sresa1b_ncar_ccsm3_0_run1_200001_201912.nc Thu Oct 20
10:47:50 2005: ncks -A -v va
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/sresa1b_ncar_ccsm3_0_run1_va_200001_209912.nc
/data/brownmc/sresa1b/atm/mo/tas/ncar_ccsm3_0/run1/sresa1b_ncar_ccsm3_0_run1_200001_209912.nc Wed
Oct 19 14:55:04 2005: ncks -F -d time,01,1200
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/sresa1b_ncar_ccsm3_0_run1_va_200001_209912.nc
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/sresa1b_ncar_ccsm3_0_run1_va_200001_209912.nc Wed
Oct 19 14:53:28 2005: ncrcat
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/foo_05_1200.nc
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/foo_1192_1196.nc
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/sresa1b_ncar_ccsm3_0_run1_va_200001_209912.nc Wed
Oct 19 14:50:38 2005: ncks -F -d time,05,1200
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/va_A1.SRESA1B_1.CCSM.atmm.2000-01_cat_2099-12.nc
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/foo_05_1200.nc Wed Oct
19 14:49:45 2005: ncrcat
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/va_A1.SRESA1B_1.CCSM.atmm.2000-01_cat_2079-12.nc
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/va_A1.SRESA1B_1.CCSM.atmm.2080-01_cat_2099-12.nc
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/va_A1.SRESA1B_1.CCSM.atmm.2000-01_cat_2099-12.nc Created
from CCSM3 case b30.040a by [email protected] on Wed Nov 17 14:12:57
EST 2004 For all data, added IPCC requested metadata"/>
<meta name="references" content="Collins, W.D., et al., 2005: The
Community Climate System Model, Version 3 Journal of Climate
Main website: http://www.ccsm.ucar.edu"/>
<meta name="source" content="CCSM3.0, version beta19 (2004): atmosphere:
CAM3.0, T85L26; ocean : POP1.4.3 (modified), gx1v3 sea ice :
CSIM5.0, T85; land : CLM3.0, gx1v3"/>
<meta name="model_name_english" content="NCAR CCSM"/>
<meta name="project_id" content="IPCC Fourth Assessment"/>
<meta name="prg_ID" content="Source file unknown Version unknown Date unknown"/>
<meta name="realization" content="1"/>
<meta name="comment" content="This simulation was initiated from year 2000 of
CCSM3 model run b30.030a and executed on hardware
cheetah.ccs.ornl.gov. The input external forcings are ozone forcing :
A1B.ozone.128x64_L18_1991-2100_c040528.nc aerosol optics :
AerosolOptics_c040105.nc aerosol MMR :
AerosolMass_V_128x256_clim_c031022.nc carbon scaling :
carbonscaling_A1B_1990-2100_c040609.nc solar forcing : Fixed at 1366.5 W
m-2 GHGs : ghg_ipcc_A1B_1870-2100_c040521.nc GHG loss rates
: noaamisc.r8.nc volcanic forcing : none DMS emissions :
DMS_emissions_128x256_clim_c040122.nc oxidants :
oxid_128x256_L26_clim_c040112.nc SOx emissions :
SOx_emissions_A1B_128x256_L2_1990-2100_c040608.nc Physical constants used
for derived data: Lv (latent heat of evaporation): 2.501e6 J kg-1 Lf
(latent heat of fusion ): 3.337e5 J kg-1 r[h2o] (density of water
): 1000 kg m-3 g2kg (grams to kilograms ): 1000 g kg-1
Integrations were performed by NCAR and CRIEPI with support and facilities
provided by NSF, DOE, MEXT and ESC/JAMSTEC."/>
<meta name="CVS_Id" content="$Id$"/>
<meta name="Content-Type" content="application/x-netcdf"/>
<meta name="resourceName" content="sresa1b_ncar_ccsm3_0_run1_200001.nc"/>
<meta name="dc:title" content="model output prepared for IPCC AR4"/>
<meta name="institution" content="NCAR (National Center for Atmospheric
Research, Boulder, CO, USA)"/>
<title>model output prepared for IPCC AR4</title>
</head>
<body>dimensions:
lat = 128;
lon = 256;
bnds = 2;
plev = 17;
time = 1;
variables:
float area(lat=128, lon=256);
:long_name = "Surface area";
:units = "meter2";
float lat(lat=128);
:long_name = "latitude";
:units = "degrees_north";
:axis = "Y";
:standard_name = "latitude";
:bounds = "lat_bnds";
double lat_bnds(lat=128, bnds=2);
float lon(lon=256);
:long_name = "longitude";
:units = "degrees_east";
:axis = "X";
:standard_name = "longitude";
:bounds = "lon_bnds";
double lon_bnds(lon=256, bnds=2);
int msk_rgn(lat=128, lon=256);
:long_name = "Mask region";
:units = "bool";
double plev(plev=17);
:long_name = "pressure";
:units = "Pa";
:standard_name = "air_pressure";
:positive = "down";
:axis = "Z";
float pr(time=1, lat=128, lon=256);
:comment = "Created using NCL code CCSM_atmm_2cf.ncl on\n machine
eagle163s";
:missing_value = 1.0E20f;
:_FillValue = 1.0E20f;
:cell_methods = "time: mean (interval: 1 month)";
:history = "(PRECC+PRECL)*r[h2o]";
:original_units = "m-1 s-1";
:original_name = "PRECC, PRECL";
:standard_name = "precipitation_flux";
:units = "kg m-2 s-1";
:long_name = "precipitation_flux";
:cell_method = "time: mean";
float tas(time=1, lat=128, lon=256);
:comment = "Created using NCL code CCSM_atmm_2cf.ncl on\n machine
eagle163s";
:missing_value = 1.0E20f;
:_FillValue = 1.0E20f;
:cell_methods = "time: mean (interval: 1 month)";
:history = "Added height coordinate";
:coordinates = "height";
:original_units = "K";
:original_name = "TREFHT";
:standard_name = "air_temperature";
:units = "K";
:long_name = "air_temperature";
:cell_method = "time: mean";
double time(time=1);
:calendar = "noleap";
:standard_name = "time";
:axis = "T";
:units = "days since 0000-1-1";
:bounds = "time_bnds";
:long_name = "time";
double time_bnds(time=1, bnds=2);
float ua(time=1, plev=17, lat=128, lon=256);
:comment = "Created using NCL code CCSM_atmm_2cf.ncl on\n machine
eagle163s";
:missing_value = 1.0E20f;
:cell_methods = "time: mean (interval: 1 month)";
:long_name = "eastward_wind";
:history = "Interpolated U with NCL \'vinth2p_ecmwf\'";
:units = "m s-1";
:original_units = "m s-1";
:original_name = "U";
:standard_name = "eastward_wind";
:_FillValue = 1.0E20f;
</body></html>[chipotle:tika/tika-app/target] mattmann%
{noformat}
> [patch] Text output for NetCDF
> ------------------------------
>
> Key: TIKA-1265
> URL: https://issues.apache.org/jira/browse/TIKA-1265
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Ann Burgess
> Assignee: Chris A. Mattmann
> Labels: patch
> Attachments: NetCDFParserPatch.patch
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> Currently Tika extracts -metadata information from NetCDF files. We are
> working on a patch that will enable -text extraction, thus providing the
> 'Dimension' and 'Variable' information.
--
This message was sent by Atlassian JIRA
(v6.2#6252)