-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32291/#review77453
-----------------------------------------------------------


Apologies, I haven't figured out how to reference specific code snippets.  
First time reviewing.  Mea culpa.

For several of these points, I've either violated or have code that is 
currently in violation of them.

Might want to wrap the table headers in thead and the table in tbody, as we do 
in XSSFExcelExtractorDecorator and jdbc.AbstractDBParser.

Do we need the xhtml.newline()?  We don't do that with other tables.  Should we 
add that to other tables?

For the tests, unless you are doing something special with the handler or 
context, shorter to use TikaTest's getXML and then run assertContains on the 
xml. 

Should probably add Metadata.CONTENT_LENGTH as a metadata item throughout.


In ISATabStudyParser and elsewhere, do we want to suck the entire file into 
memory?  Better to use the iterator (on the unchecked! theory) that the 
iterator does just in time parsing?
List<CSVRecord> records = csvParser.getRecords();

There's still a pretty big TODO in ISATabInvestigationParser btwn 
startDocument() and endDocument()...I might be looking at an old copy, or this 
is the intended behavior.  If it is the intended behavior, note that in the 
javadocs!

In ISATabInvestigationParser, why not use the CSVParser so that you don't have 
to do this?
metadata.add(values[0], values[i].replaceAll("(^\")|(\"$)",""));

+1 to Chris's recommendation to create short dummy test files that are clean 
with respect to ASL 2.0.

- Tim Allison


On March 23, 2015, 5:04 p.m., Giuseppe Totaro wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32291/
> -----------------------------------------------------------
> 
> (Updated March 23, 2015, 5:04 p.m.)
> 
> 
> Review request for tika and Chris Mattmann.
> 
> 
> Bugs: TIKA-1580
>     https://issues.apache.org/jira/browse/TIKA-1580
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> ISATab parsers. This preliminary solution provides three parsers, one for 
> each ISA-Tab filetype (Investigation, Study, Assay).
> 
> 
> Diffs
> -----
> 
>   trunk/tika-bundle/pom.xml 1668683 
>   trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml 
> 1668683 
>   trunk/tika-parsers/pom.xml 1668683 
>   
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabAssayParser.java
>  PRE-CREATION 
>   
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabInvestigationParser.java
>  PRE-CREATION 
>   
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabStudyParser.java
>  PRE-CREATION 
>   
> trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
>  1668683 
>   
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabAssayParserTest.java
>  PRE-CREATION 
>   
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabInvestigationParserTest.java
>  PRE-CREATION 
>   
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabStudyParserTest.java
>  PRE-CREATION 
>   
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_bii-s-2_metabolite
>  profiling_NMR spectroscopy.txt PRE-CREATION 
>   
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_metabolome.txt
>  PRE-CREATION 
>   
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_microarray.txt
>  PRE-CREATION 
>   
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_proteome.txt
>  PRE-CREATION 
>   
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_transcriptome.txt
>  PRE-CREATION 
>   
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/i_investigation.txt
>  PRE-CREATION 
>   
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-1.txt
>  PRE-CREATION 
>   
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-2.txt
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/32291/diff/
> 
> 
> Testing
> -------
> 
> Tested on sample ISA-Tab files downloaded from 
> http://www.isa-tools.org/format/examples/.
> 
> 
> Thanks,
> 
> Giuseppe Totaro
> 
>

Reply via email to