Susan created TIKA-2684:
---------------------------
Summary: Tika does not extract *.fits header text, just file level
metadata
Key: TIKA-2684
URL: https://issues.apache.org/jira/browse/TIKA-2684
Project: Tika
Issue Type: Improvement
Components: metadata, mime, parser
Affects Versions: 1.18
Reporter: Susan
Tika only pull file level metadata for *.fits (flexible image transport system)
files:
Content-Length: 699840
Content-Type: application/fits
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.gdal.GDALParser
X-TIKA:digest:MD5: d93e8f4654902c45c7f3e4f4bf5f63e2
X-TIKA:digest:SHA256:
da7c0f1b6643850856cba100e9b3e8db76b80e91583eb088635c416a2b4161b3
resourceName: WFPC2u5780205r_c0fx.fits
Rather than text from the header (extracted with astropy.py):
SIMPLE = T / file does conform to FITS standard
BITPIX = -32 / number of bits per data pixel
NAXIS = 3 / number of data axes
NAXIS1 = 200 / length of data axis 1
NAXIS2 = 200 / length of data axis 2
NAXIS3 = 4 / length of data axis 3
EXTEND = T / FITS dataset may contain extensions
COMMENT FITS (Flexible Image Transport System) format is defined in
'AstronomyCOMMENT and Astrophysics', volume 376, page 359; bibcode:
2001A&A...376..359H BSCALE = 1.0E0 / REAL = TAPE*BSCALE + BZERO
BZERO = 0.0E0 /
OPSIZE = 2112 / PSIZE of original image
ORIGIN = 'STScI-STSDAS' / Fitsio version 21-Feb-1996
FITSDATE= '2004-01-09' / Date FITS file was created
FILENAME= 'u5780205r_cvt.c0h' / Original filename
ALLG-MAX= 3.777701E3 / Data max in all groups
ALLG-MIN= -7.319537E1 / Data min in all groups
ODATTYPE= 'FLOATING' / Original datatype: Single
precision real SDASMGNU= 4 / Number of groups in
original image
This was capability was mentioned in Tika-874. I'm looking at netCDF
files/headers as model for this behaviour.
Thank you!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)