[ 
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505092#comment-14505092
 ] 

Tim Allison commented on TIKA-1513:
-----------------------------------

Completely agree.  

Only 2,386 files.

This is the table of the file extensions for files identified as 
application/octet-stream.

||File Extension||Count||
|dbase3|        1664|
|wp|    362|
|unk|   285|
|gls|   60|
|ileaf| 4|
|sys|   3|
|chp|   2|
|lnk|   2|
|mac|   2|
|squeak|        1|
|bin|   1|

Would very much appreciate what you find, and yes, we can certainly decrease 
the priority...I had my priorities backwards.  Sorry.

Obviously, if you find false positives, we'll back off to file suffix.  I, too, 
was less than enthusiastic about a single byte mime id'er.

Thank you!

> Add mime detection and parsing for dbf files
> --------------------------------------------
>
>                 Key: TIKA-1513
>                 URL: https://issues.apache.org/jira/browse/TIKA-1513
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 1.9
>
>
> I just came across an Apache licensed dbf parser that is available on 
> [maven|https://repo1.maven.org/maven2/org/jamel/dbf/dbf-reader/0.1.0/dbf-reader-0.1.0.pom].
> Let's add dbf parsing to Tika.
> Any other recommendations for alternate parsers?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to