[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287980#comment-14287980
]
Tim Allison commented on TIKA-1513:
-----------------------------------
[~iryndin], on codepage detection in dbf...in one of the specs I read, it looks
like there is a byte in the header that may or may be set that specifies the
codepage for the table. Are you, by chance, parsing that?
If we wanted to integrate our charset detector, would we call getBytes() on the
first X DbfRecords, run those through our detector and then reprocess the
stream with that charset?
I installed OpenOffice so that I could create test dbf documents, but the
results have been pretty poor.
> Add mime detection and parsing for dbf files
> --------------------------------------------
>
> Key: TIKA-1513
> URL: https://issues.apache.org/jira/browse/TIKA-1513
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
> Fix For: 1.8
>
>
> I just came across an Apache licensed dbf parser that is available on
> [maven|https://repo1.maven.org/maven2/org/jamel/dbf/dbf-reader/0.1.0/dbf-reader-0.1.0.pom].
> Let's add dbf parsing to Tika.
> Any other recommendations for alternate parsers?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)