[
https://issues.apache.org/jira/browse/TIKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332210#comment-14332210
]
Aakarsh Medleri Hire Math commented on TIKA-1532:
-------------------------------------------------
Hi Nick,
Sorry for the delayed response.
It seems like there is no unique mime type associated with GCMD .dif files. We
have crawled around 8000 files from ACADIS website (https://www.aoncadis.org)
and all these files had their content type set to text/plain. However, the data
itself is represented in XML format. Does that mean TIKA should detect it as
application/xml or text/xml?
Here is one such example: https://www.aoncadis.org/dataset/Zamora2010.dif
You can find rest of the crawled links at:
https://raw.githubusercontent.com/shekarprashant/TikaDirectedResearch/master/Acadis%20Complete%20Crawl%20Raw%20Results.csv
Looking forward for your inputs.
Thanks,
Aakarsh
> DIF Parser
> ----------
>
> Key: TIKA-1532
> URL: https://issues.apache.org/jira/browse/TIKA-1532
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Aakarsh Medleri Hire Math
>
> MIME Type detection & content parser for .dif format
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)