[
https://issues.apache.org/jira/browse/TIKA-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559873#comment-16559873
]
Hudson commented on TIKA-2641:
------------------------------
SUCCESS: Integrated in Jenkins build tika-branch-1x #63 (See
[https://builds.apache.org/job/tika-branch-1x/63/])
Stub a unit test for TIKA-2641 (tallison:
[https://github.com/apache/tika/commit/aaa78a3d665d8c120e8eadbc26f3d86958042c05])
* (add)
tika-parsers/src/test/java/org/apache/tika/parser/TabularFormatsTest.java
> Unit test for consistency between tabular/columnar formats
> ----------------------------------------------------------
>
> Key: TIKA-2641
> URL: https://issues.apache.org/jira/browse/TIKA-2641
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 2.0, 1.18
> Reporter: Nick Burch
> Priority: Minor
>
> We now have a number of parsers which deal with file formats which are either
> wholey or optionally "table-based" formats with consistency in the data types
> held in a given column. This includes multi-table formats like sqlite,
> single-table formats like sas7bdat, and anything-goes-table formats like csv
> or xlsx
> We should firstly try to create a simple-ish, small but rich file for each of
> these formats, similar to what we do for archive formats with the
> {{test-documents}} archives. Then, we should add unit tests that verified
> that, as much as formats permit, you get basically the same XHTML out for the
> "same" input. Oh, and fix up any obvious inconsistencies...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)