[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

Giuseppe Totaro (JIRA) Sun, 25 Jan 2015 10:15:05 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291207#comment-14291207
 ]


Giuseppe Totaro commented on TIKA-1423:
---------------------------------------

Hello [~vinegh], I noted in your parser that you instantiate a {{File}} object 
starting from {{RESOURCE_NAME_KEY}} string without using the {{InputStream}} 
object passed to the {{parse}} method:
{code:title=gribParser.java|borderStyle=solid}
…
49         //Get grib2 file name from metadata                                  
     
50                                                                              
     
51         File gribFile = new File(metadata.get(Metadata.RESOURCE_NAME_KEY));  
     
52                                                                              
     
53         try {                                                                
    
54             NetcdfFile ncFile = 
NetcdfDataset.openFile(gribFile.getAbsolutePath(),
…
{code}
This means that any implementation that does not define the 
{{RESOURCE_NAME_KEY}} property in the caller as follows
{code}
metadata.add(Metadata.RESOURCE_NAME_KEY, filename);
{code}
will fail because the {{File}} constructor throws a {{NullPointerException}}.
Instead of adding {{RESOURCE_NAME_KEY}}, we can obtain the file from stream 
using the {{TikaInputStream}} class as well as in {{NetCDFParser.java}}:
{code}
 51         //File gribFile = new 
File(metadata.get(Metadata.RESOURCE_NAME_KEY));
 53         TikaInputStream tis = TikaInputStream.get(stream, new 
TemporaryResources());
 54 
 55         try {
 57             NetcdfFile ncFile = 
NetcdfDataset.openFile(tis.getFile().getAbsolutePath(), null);
{code}
I tested it on my macbook and it works. I tried also the 
[netcdf-tools|http://netcdftools.sourceforge.net/] library for retrieving the 
set of global attributes but it does not work well and it seems outdated.
Thank you for your great work,
Giuseppe

> Build a parser to extract data from GRIB formats
> ------------------------------------------------
>
>                 Key: TIKA-1423
>                 URL: https://issues.apache.org/jira/browse/TIKA-1423
>             Project: Tika
>          Issue Type: New Feature
>          Components: metadata, mime, parser
>    Affects Versions: 1.6
>            Reporter: Vineet Ghatge
>            Assignee: Vineet Ghatge
>            Priority: Critical
>              Labels: features, newbie
>             Fix For: 1.8
>
>         Attachments: GRIBParsertest.java, GribParser.java, 
> NLDAS_FORA0125_H.A20130112.1200.002.grb, TIKA-1423.palsulich.120614.patch, 
> TIKA-1423.patch, fileName.html, gdas1.forecmwf.2014062612.grib2
>
>
> Arctic dataset contains a MIME format called GRIB -  General 
> Regularlydistributed information in Binary form 
> http://en.wikipedia.org/wiki/GRIB . GRIB is a well known data format which is 
> a concise data format used in meteorology to store historical and 
> weather data. There are 2 different types of the format  GRIB 0, GRIB 2.  
> The focus will be on GRIB 2 which is the most prevalent. Each GRIB record 
> intended for either transmission or storage contains a single parameter with 
> values located at an array of grid points, or represented as a set of 
> spectral coefficients, for a single level (or layer), encoded as a continuous 
> bit stream. Logical divisions of the record are designated as "sections", 
> each of which provides control information and/or data. A GRIB record 
> consists of six sections, two of which are optional: 
>  
> (0) Indicator Section 
> (1) Product Definition Section (PDS) 
> (2) Grid Description Section (GDS)  optional 
> (3) Bit Map Section (BMS)  optional 
> (4) Binary Data Section (BDS) 
> (5) '7777' (ASCII Characters)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

Reply via email to