[
https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1561:
--------------------------
Description:
cited from the http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf
"The Directory Interchange Format (DIF) is metadata format used to create
directory entries that describe scientific data
sets. A DIF holds a collection of fields, which detail specific information
about the data."
The .dif file respect proper xml format that describe the scientific data set,
the schema xsd files can be found inside the .dif xml file.
i,e, http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/dif_v9.8.4.xsd
The reason opening this ticket is tika parser for this dif file is being under
consideration with development, the support to identify the file is needed.
Although dif file in this case seems to be an xml file which can be parsed
properly by xmlparser, still it might need a specific process on some of the
fields to be extracted and injected into the System for analysis.
Then it is decided that the following type 'text/dif+xml' is used that extends
the application/xml, so that we can apply some special process to this
particular xml file.
<mime-type type="text/dif+xml">
<root-XML localName="DIF"/>
<root-XML localName="DIF"
namespaceURI="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/"/>
<glob pattern="*.dif"/>
<sub-class-of type="application/xml"/>
</mime-type>
Expected MIME type: text/dif+xml
The following is the link to the dif format guide
http://gcmd.nasa.gov/add/difguide/
example dif files:
1) https://www.aoncadis.org/dataset/id/005f3222-7548-11e2-851e-00c0f03d5b7c.dif
2) https://www.aoncadis.org/dataset/id/0091cf0c-7ad3-11e2-851e-00c0f03d5b7c.dif
3) https://www.aoncadis.org/dataset/id/02a6301c-3ab3-11e4-8ee7-00c0f03d5b7c.dif
an example dif file has also been attached.
was:
cited from the http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf
"The Directory Interchange Format (DIF) is metadata format used to create
directory entries that describe scientific data
sets. A DIF holds a collection of fields, which detail specific information
about the data."
The .dif file respect proper xml format that describe the scientific data set,
the schema xsd files can be found inside the .dif xml file.
i,e, http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/dif_v9.8.4.xsd
The reason opening this ticket is tika parser for this dif file is being under
consideration with development, the support to identify the file is needed.
Although dif file in this case seems to be an xml file which can be parsed
properly by xmlparser, still it might need a specific process on some of the
fields to be extracted and injected into the System for analysis.
Then it is decided that the following type 'text/dif+xml' is used that extends
the application/xml, so that we can apply some special process to this
particular xml file.
<mime-type type="text/dif+xml">
<root-XML localName="DIF"/>
<root-XML localName="DIF"
namespaceURI="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/"/>
<glob pattern="*.dif"/>
<sub-class-of type="application/xml"/>
</mime-type>
Expected MIME type: text/dif+xml
The following is the link to the dif format guide
http://gcmd.nasa.gov/add/difguide/
> GCMD Directory Interchange Format (.dif) identification
> -------------------------------------------------------
>
> Key: TIKA-1561
> URL: https://issues.apache.org/jira/browse/TIKA-1561
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Affects Versions: 1.7
> Reporter: Luke sh
> Priority: Trivial
> Attachments:
> carbon_isotopic_values_of_alkanes_extracted_from_paleosols.dif
>
>
> cited from the http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf
> "The Directory Interchange Format (DIF) is metadata format used to create
> directory entries that describe scientific data
> sets. A DIF holds a collection of fields, which detail specific information
> about the data."
> The .dif file respect proper xml format that describe the scientific data
> set, the schema xsd files can be found inside the .dif xml file.
> i,e, http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/dif_v9.8.4.xsd
> The reason opening this ticket is tika parser for this dif file is being
> under consideration with development, the support to identify the file is
> needed.
> Although dif file in this case seems to be an xml file which can be parsed
> properly by xmlparser, still it might need a specific process on some of the
> fields to be extracted and injected into the System for analysis.
> Then it is decided that the following type 'text/dif+xml' is used that
> extends the application/xml, so that we can apply some special process to
> this particular xml file.
> <mime-type type="text/dif+xml">
> <root-XML localName="DIF"/>
> <root-XML localName="DIF"
> namespaceURI="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/"/>
> <glob pattern="*.dif"/>
> <sub-class-of type="application/xml"/>
> </mime-type>
> Expected MIME type: text/dif+xml
> The following is the link to the dif format guide
> http://gcmd.nasa.gov/add/difguide/
> example dif files:
> 1)
> https://www.aoncadis.org/dataset/id/005f3222-7548-11e2-851e-00c0f03d5b7c.dif
> 2)
> https://www.aoncadis.org/dataset/id/0091cf0c-7ad3-11e2-851e-00c0f03d5b7c.dif
> 3)
> https://www.aoncadis.org/dataset/id/02a6301c-3ab3-11e4-8ee7-00c0f03d5b7c.dif
> an example dif file has also been attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)