Susan created TIKA-2684:
---------------------------

             Summary: Tika does not extract *.fits header text, just file level 
metadata
                 Key: TIKA-2684
                 URL: https://issues.apache.org/jira/browse/TIKA-2684
             Project: Tika
          Issue Type: Improvement
          Components: metadata, mime, parser
    Affects Versions: 1.18
            Reporter: Susan


Tika only pull file level metadata for *.fits (flexible image transport system) 
files:

Content-Length: 699840
Content-Type: application/fits
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.gdal.GDALParser
X-TIKA:digest:MD5: d93e8f4654902c45c7f3e4f4bf5f63e2
X-TIKA:digest:SHA256: 
da7c0f1b6643850856cba100e9b3e8db76b80e91583eb088635c416a2b4161b3
resourceName: WFPC2u5780205r_c0fx.fits

Rather than text from the header (extracted with astropy.py):

SIMPLE  =                    T / file does conform to FITS standard             
BITPIX  =                  -32 / number of bits per data pixel                  
NAXIS   =                    3 / number of data axes                            
NAXIS1  =                  200 / length of data axis 1                          
NAXIS2  =                  200 / length of data axis 2                          
NAXIS3  =                    4 / length of data axis 3                          
EXTEND  =                    T / FITS dataset may contain extensions            
COMMENT   FITS (Flexible Image Transport System) format is defined in 
'AstronomyCOMMENT   and Astrophysics', volume 376, page 359; bibcode: 
2001A&A...376..359H BSCALE  =                1.0E0 / REAL = TAPE*BSCALE + BZERO 
                    BZERO   =                0.0E0 /                            
                    OPSIZE  =                 2112 / PSIZE of original image    
                    ORIGIN  = 'STScI-STSDAS'       / Fitsio version 21-Feb-1996 
                    FITSDATE= '2004-01-09'         / Date FITS file was created 
                    FILENAME= 'u5780205r_cvt.c0h'  / Original filename          
                    ALLG-MAX=           3.777701E3 / Data max in all groups     
                    ALLG-MIN=          -7.319537E1 / Data min in all groups     
                    ODATTYPE= 'FLOATING'           / Original datatype: Single 
precision real       SDASMGNU=                    4 / Number of groups in 
original image    

 

This was capability was mentioned in Tika-874. I'm looking at netCDF 
files/headers as model for this behaviour. 

Thank you!

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to