[ 
https://issues.apache.org/jira/browse/TIKA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991234#comment-13991234
 ] 

Chris A. Mattmann commented on TIKA-1265:
-----------------------------------------

Tested on example NetCDF, acceptance test passed:

{noformat}
[chipotle:tika/tika-app/target] mattmann% java -jar tika-app-1.6-SNAPSHOT.jar 
/Users/mattmann/tmp/tika/tika-parsers/src/test/resources/test-documents/sresa1b_ncar_ccsm3_0_run1_200001.nc
<?xml version="1.0" encoding="UTF-8"?><html 
xmlns="http://www.w3.org/1999/xhtml";>
<head>
<meta name="Conventions" content="CF-1.0"/>
<meta name="acknowledgment" content=" Any use of CCSM data should acknowledge 
the contribution&#10; of the CCSM project and CCSM sponsor agencies with the 
&#10; following citation:&#10; 'This research uses data provided by the 
Community Climate&#10; System Model project (www.ccsm.ucar.edu), supported by 
the&#10; Directorate for Geosciences of the National Science Foundation&#10; 
and the Office of Biological and Environmental Research of&#10; the U.S. 
Department of Energy.'&#10;In addition, the words 'Community Climate System 
Model' and&#10; 'CCSM' should be included as metadata for webpages 
referencing&#10; work using CCSM data or as keywords provided to journal or 
book&#10;publishers of your manuscripts.&#10;Users of CCSM data accept the 
responsibility of emailing&#10; citations of publications of research using 
CCSM data to&#10; [email protected].&#10;Any redistribution of CCSM data must 
include this data&#10; acknowledgement statement."/>
<meta name="Content-Length" content="2767916"/>
<meta name="experiment_id" content="720 ppm stabilization experiment 
(SRESA1B)"/>
<meta name="table_id" content="Table A1"/>
<meta name="cmd_ln" content="bds -x 256 -y 128 -m 23 -o 
/data/zender/data/dst_T85.nc"/>
<meta name="contact" content="[email protected]"/>
<meta name="creation_date" content=""/>
<meta name="history" content="Tue Oct 25 15:08:51 2005: ncks -O -x -v va -m 
sresa1b_ncar_ccsm3_0_run1_200001.nc sresa1b_ncar_ccsm3_0_run1_200001.nc&#10;Tue 
Oct 25 15:07:21 2005: ncks -d time,0 sresa1b_ncar_ccsm3_0_run1_200001_201912.nc 
sresa1b_ncar_ccsm3_0_run1_200001.nc&#10;Tue Oct 25 13:29:43 2005: ncks -d 
time,0,239 sresa1b_ncar_ccsm3_0_run1_200001_209912.nc 
/var/www/html/tmp/sresa1b_ncar_ccsm3_0_run1_200001_201912.nc&#10;Thu Oct 20 
10:47:50 2005: ncks -A -v va 
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/sresa1b_ncar_ccsm3_0_run1_va_200001_209912.nc
 
/data/brownmc/sresa1b/atm/mo/tas/ncar_ccsm3_0/run1/sresa1b_ncar_ccsm3_0_run1_200001_209912.nc&#10;Wed
 Oct 19 14:55:04 2005: ncks -F -d time,01,1200 
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/sresa1b_ncar_ccsm3_0_run1_va_200001_209912.nc
 
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/sresa1b_ncar_ccsm3_0_run1_va_200001_209912.nc&#10;Wed
 Oct 19 14:53:28 2005: ncrcat 
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/foo_05_1200.nc 
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/foo_1192_1196.nc 
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/sresa1b_ncar_ccsm3_0_run1_va_200001_209912.nc&#10;Wed
 Oct 19 14:50:38 2005: ncks -F -d time,05,1200 
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/va_A1.SRESA1B_1.CCSM.atmm.2000-01_cat_2099-12.nc
 /data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/foo_05_1200.nc&#10;Wed Oct 
19 14:49:45 2005: ncrcat 
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/va_A1.SRESA1B_1.CCSM.atmm.2000-01_cat_2079-12.nc
 
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/va_A1.SRESA1B_1.CCSM.atmm.2080-01_cat_2099-12.nc
 
/data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/va_A1.SRESA1B_1.CCSM.atmm.2000-01_cat_2099-12.nc&#10;Created
 from CCSM3 case b30.040a&#10; by [email protected]&#10; on Wed Nov 17 14:12:57 
EST 2004&#10; &#10; For all data, added IPCC requested metadata"/>
<meta name="references" content="Collins, W.D., et al., 2005:&#10; The 
Community Climate System Model, Version 3&#10; Journal of Climate&#10; &#10; 
Main website: http://www.ccsm.ucar.edu"/>
<meta name="source" content="CCSM3.0, version beta19 (2004): &#10;atmosphere: 
CAM3.0, T85L26;&#10;ocean     : POP1.4.3 (modified), gx1v3&#10;sea ice   : 
CSIM5.0, T85;&#10;land      : CLM3.0, gx1v3"/>
<meta name="model_name_english" content="NCAR CCSM"/>
<meta name="project_id" content="IPCC Fourth Assessment"/>
<meta name="prg_ID" content="Source file unknown Version unknown Date unknown"/>
<meta name="realization" content="1"/>
<meta name="comment" content="This simulation was initiated from year 2000 of 
&#10; CCSM3 model run b30.030a and executed on &#10; hardware 
cheetah.ccs.ornl.gov. The input external forcings are&#10;ozone forcing    : 
A1B.ozone.128x64_L18_1991-2100_c040528.nc&#10;aerosol optics   : 
AerosolOptics_c040105.nc&#10;aerosol MMR      : 
AerosolMass_V_128x256_clim_c031022.nc&#10;carbon scaling   : 
carbonscaling_A1B_1990-2100_c040609.nc&#10;solar forcing    : Fixed at 1366.5 W 
m-2&#10;GHGs             : ghg_ipcc_A1B_1870-2100_c040521.nc&#10;GHG loss rates 
  : noaamisc.r8.nc&#10;volcanic forcing : none&#10;DMS emissions    : 
DMS_emissions_128x256_clim_c040122.nc&#10;oxidants         : 
oxid_128x256_L26_clim_c040112.nc&#10;SOx emissions    : 
SOx_emissions_A1B_128x256_L2_1990-2100_c040608.nc&#10; Physical constants used 
for derived data:&#10; Lv (latent heat of evaporation): 2.501e6 J kg-1&#10; Lf 
(latent heat of fusion     ): 3.337e5 J kg-1&#10; r[h2o] (density of water      
): 1000 kg m-3&#10; g2kg   (grams to kilograms    ): 1000 g kg-1&#10; &#10; 
Integrations were performed by NCAR and CRIEPI with support&#10; and facilities 
provided by NSF, DOE, MEXT and ESC/JAMSTEC."/>
<meta name="CVS_Id" content="$Id$"/>
<meta name="Content-Type" content="application/x-netcdf"/>
<meta name="resourceName" content="sresa1b_ncar_ccsm3_0_run1_200001.nc"/>
<meta name="dc:title" content="model output prepared for IPCC AR4"/>
<meta name="institution" content="NCAR (National Center for Atmospheric 
&#10;Research, Boulder, CO, USA)"/>
<title>model output prepared for IPCC AR4</title>
</head>
<body>dimensions:
lat = 128;
lon = 256;
bnds = 2;
plev = 17;
time = 1;

variables:
float area(lat=128, lon=256);
        :long_name = "Surface area";
        :units = "meter2";

float lat(lat=128);
        :long_name = "latitude";
        :units = "degrees_north";
        :axis = "Y";
        :standard_name = "latitude";
        :bounds = "lat_bnds";

double lat_bnds(lat=128, bnds=2);

float lon(lon=256);
        :long_name = "longitude";
        :units = "degrees_east";
        :axis = "X";
        :standard_name = "longitude";
        :bounds = "lon_bnds";

double lon_bnds(lon=256, bnds=2);

int msk_rgn(lat=128, lon=256);
        :long_name = "Mask region";
        :units = "bool";

double plev(plev=17);
        :long_name = "pressure";
        :units = "Pa";
        :standard_name = "air_pressure";
        :positive = "down";
        :axis = "Z";

float pr(time=1, lat=128, lon=256);
        :comment = "Created using NCL code CCSM_atmm_2cf.ncl on\n machine 
eagle163s";
        :missing_value = 1.0E20f;
        :_FillValue = 1.0E20f;
        :cell_methods = "time: mean (interval: 1 month)";
        :history = "(PRECC+PRECL)*r[h2o]";
        :original_units = "m-1 s-1";
        :original_name = "PRECC, PRECL";
        :standard_name = "precipitation_flux";
        :units = "kg m-2 s-1";
        :long_name = "precipitation_flux";
        :cell_method = "time: mean";

float tas(time=1, lat=128, lon=256);
        :comment = "Created using NCL code CCSM_atmm_2cf.ncl on\n machine 
eagle163s";
        :missing_value = 1.0E20f;
        :_FillValue = 1.0E20f;
        :cell_methods = "time: mean (interval: 1 month)";
        :history = "Added height coordinate";
        :coordinates = "height";
        :original_units = "K";
        :original_name = "TREFHT";
        :standard_name = "air_temperature";
        :units = "K";
        :long_name = "air_temperature";
        :cell_method = "time: mean";

double time(time=1);
        :calendar = "noleap";
        :standard_name = "time";
        :axis = "T";
        :units = "days since 0000-1-1";
        :bounds = "time_bnds";
        :long_name = "time";

double time_bnds(time=1, bnds=2);

float ua(time=1, plev=17, lat=128, lon=256);
        :comment = "Created using NCL code CCSM_atmm_2cf.ncl on\n machine 
eagle163s";
        :missing_value = 1.0E20f;
        :cell_methods = "time: mean (interval: 1 month)";
        :long_name = "eastward_wind";
        :history = "Interpolated U with NCL \'vinth2p_ecmwf\'";
        :units = "m s-1";
        :original_units = "m s-1";
        :original_name = "U";
        :standard_name = "eastward_wind";
        :_FillValue = 1.0E20f;
</body></html>[chipotle:tika/tika-app/target] mattmann% 
{noformat}


> [patch] Text output for NetCDF
> ------------------------------
>
>                 Key: TIKA-1265
>                 URL: https://issues.apache.org/jira/browse/TIKA-1265
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Ann Burgess
>            Assignee: Chris A. Mattmann
>              Labels: patch
>         Attachments: NetCDFParserPatch.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently Tika extracts -metadata information from NetCDF files. We are 
> working on a patch that will enable -text extraction, thus providing the 
> 'Dimension' and 'Variable' information.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to