[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236079#comment-14236079 ]
Tyler Palsulich commented on TIKA-1423: --------------------------------------- Thanks [~lewismc]! Please see https://github.com/tpalsulich/tika/tree/TIKA-1423 for an updated Parser. I would create a pull request, but the hemantku repo is behind trunk by several commits. I fixed a few issues from the RB (style updates and no longer use a Metadata key to pass the name of the file). I also updated the pom dependency configuration. Everything compiles and all tests pass. :) [~vinegh] and [~chrismattmann], what do you think? In particular, is it OK to add the following in order to grab the latest version of the netcdf dependency? {code} <repositories> <repository> <id>unidata-releases</id> <name>UNIDATA Releases</name> <url>https://artifacts.unidata.ucar.edu/content/repositories/unidata-releases/</url> </repository> </repositories> {code} > Build a parser to extract data from GRIB formats > ------------------------------------------------ > > Key: TIKA-1423 > URL: https://issues.apache.org/jira/browse/TIKA-1423 > Project: Tika > Issue Type: New Feature > Components: metadata, mime, parser > Affects Versions: 1.6 > Reporter: Vineet Ghatge > Assignee: Vineet Ghatge > Priority: Critical > Labels: features, newbie > Fix For: 1.8 > > Attachments: GRIBParsertest.java, GribParser.java, > NLDAS_FORA0125_H.A20130112.1200.002.grb, fileName.html, > gdas1.forecmwf.2014062612.grib2 > > > Arctic dataset contains a MIME format called GRIB - General > Regularlydistributed information in Binary form > http://en.wikipedia.org/wiki/GRIB . GRIB is a well known data format which is > a concise data format used in meteorology to store historical and > weather data. There are 2 different types of the format GRIB 0, GRIB 2. > The focus will be on GRIB 2 which is the most prevalent. Each GRIB record > intended for either transmission or storage contains a single parameter with > values located at an array of grid points, or represented as a set of > spectral coefficients, for a single level (or layer), encoded as a continuous > bit stream. Logical divisions of the record are designated as "sections", > each of which provides control information and/or data. A GRIB record > consists of six sections, two of which are optional: > > (0) Indicator Section > (1) Product Definition Section (PDS) > (2) Grid Description Section (GDS) optional > (3) Bit Map Section (BMS) optional > (4) Binary Data Section (BDS) > (5) '7777' (ASCII Characters) -- This message was sent by Atlassian JIRA (v6.3.4#6332)