[ 
https://issues.apache.org/jira/browse/TIKA-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480769#comment-16480769
 ] 

Nick Burch commented on TIKA-2641:
----------------------------------

All 3 excel formats are now tested and showing basically the same output as 
SAS7BDAT.

The namespace error with ODS remains, so that test is disabled

DB formats still need test files then testing

> Unit test for consistency between tabular/columnar formats
> ----------------------------------------------------------
>
>                 Key: TIKA-2641
>                 URL: https://issues.apache.org/jira/browse/TIKA-2641
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 2.0, 1.18
>            Reporter: Nick Burch
>            Priority: Minor
>
> We now have a number of parsers which deal with file formats which are either 
> wholey or optionally "table-based" formats with consistency in the data types 
> held in a given column. This includes multi-table formats like sqlite, 
> single-table formats like sas7bdat, and anything-goes-table formats like csv 
> or xlsx
> We should firstly try to create a simple-ish, small but rich file for each of 
> these formats, similar to what we do for archive formats with the 
> {{test-documents}} archives. Then, we should add unit tests that verified 
> that, as much as formats permit, you get basically the same XHTML out for the 
> "same" input. Oh, and fix up any obvious inconsistencies...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to