[ 
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287980#comment-14287980
 ] 

Tim Allison commented on TIKA-1513:
-----------------------------------

[~iryndin], on codepage detection in dbf...in one of the specs I read, it looks 
like there is a byte in the header that may or may be set that specifies the 
codepage for the table.  Are you, by chance, parsing that?

If we wanted to integrate our charset detector, would we call getBytes() on the 
first X DbfRecords, run those through our detector and then reprocess the 
stream with that charset?

I installed OpenOffice so that I could create test dbf documents, but the 
results have been pretty poor.

> Add mime detection and parsing for dbf files
> --------------------------------------------
>
>                 Key: TIKA-1513
>                 URL: https://issues.apache.org/jira/browse/TIKA-1513
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 1.8
>
>
> I just came across an Apache licensed dbf parser that is available on 
> [maven|https://repo1.maven.org/maven2/org/jamel/dbf/dbf-reader/0.1.0/dbf-reader-0.1.0.pom].
> Let's add dbf parsing to Tika.
> Any other recommendations for alternate parsers?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to