[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239062#comment-15239062
]
Tim Allison edited comment on TIKA-1513 at 4/13/16 10:52 AM:
-------------------------------------------------------------
[~gagravarr], would you mind taking a look at the detector? Is there a way
that we can convert this to a mime definition? Or should we add a DBFDetector?
[~nicholasc], it looks great to me. I agree that we'll probably want to relax
some of the length checks (just make sure they're > 0 or something
reasonable)...we wouldn't want this to fail on truncated dbfs, and as you've
pointed out, there can be extra bytes at the end of the file. If there's any
way to avoid adding the dependency, that'd be great...although, I very much
appreciate the concern for overflow!
In your experience, do we need to validate the fieldentry or can we stop
sooner? If we do, then I suspect there's no way to convert to a mime
definition, but I suspect much of the earlier stuff could easily be translated.
Oh, and please make sure to add an Apache license header...unless Nick B can
easily translate this to a mime definition. :)
Thank you!
was (Author: [email protected]):
[~gagravarr], would you mind taking a look at the detector? Is there a way
that we can convert this to a mime definition? Or should we add a DBFDetector?
[~nicholasc], it looks great to me. I agree that we'll probably want to relax
some of the length checks (just make sure they're > 0 or something
reasonable)...we wouldn't want this to fail on truncated dbfs, and as you've
pointed out, there can be extra bytes at the end of the file. If there's any
way to avoid adding the dependency, that'd be great...although, I very much
appreciate the concern for overflow!
In your experience, do we need to validate the fieldentry or can we stop
sooner? If we do, then I suspect there's no way to convert to a mime
definition, but I suspect much of the earlier stuff could easily be translated.
Thank you!
> Add mime detection and parsing for dbf files
> --------------------------------------------
>
> Key: TIKA-1513
> URL: https://issues.apache.org/jira/browse/TIKA-1513
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
> Fix For: 1.13
>
>
> I just came across an Apache licensed dbf parser that is available on
> [maven|https://repo1.maven.org/maven2/org/jamel/dbf/dbf-reader/0.1.0/dbf-reader-0.1.0.pom].
> Let's add dbf parsing to Tika.
> Any other recommendations for alternate parsers?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)