-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32291/#review77453
-----------------------------------------------------------
Apologies, I haven't figured out how to reference specific code snippets.
First time reviewing. Mea culpa.
For several of these points, I've either violated or have code that is
currently in violation of them.
Might want to wrap the table headers in thead and the table in tbody, as we do
in XSSFExcelExtractorDecorator and jdbc.AbstractDBParser.
Do we need the xhtml.newline()? We don't do that with other tables. Should we
add that to other tables?
For the tests, unless you are doing something special with the handler or
context, shorter to use TikaTest's getXML and then run assertContains on the
xml.
Should probably add Metadata.CONTENT_LENGTH as a metadata item throughout.
In ISATabStudyParser and elsewhere, do we want to suck the entire file into
memory? Better to use the iterator (on the unchecked! theory) that the
iterator does just in time parsing?
List<CSVRecord> records = csvParser.getRecords();
There's still a pretty big TODO in ISATabInvestigationParser btwn
startDocument() and endDocument()...I might be looking at an old copy, or this
is the intended behavior. If it is the intended behavior, note that in the
javadocs!
In ISATabInvestigationParser, why not use the CSVParser so that you don't have
to do this?
metadata.add(values[0], values[i].replaceAll("(^\")|(\"$)",""));
+1 to Chris's recommendation to create short dummy test files that are clean
with respect to ASL 2.0.
- Tim Allison
On March 23, 2015, 5:04 p.m., Giuseppe Totaro wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32291/
> -----------------------------------------------------------
>
> (Updated March 23, 2015, 5:04 p.m.)
>
>
> Review request for tika and Chris Mattmann.
>
>
> Bugs: TIKA-1580
> https://issues.apache.org/jira/browse/TIKA-1580
>
>
> Repository: tika
>
>
> Description
> -------
>
> ISATab parsers. This preliminary solution provides three parsers, one for
> each ISA-Tab filetype (Investigation, Study, Assay).
>
>
> Diffs
> -----
>
> trunk/tika-bundle/pom.xml 1668683
> trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
> 1668683
> trunk/tika-parsers/pom.xml 1668683
>
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabAssayParser.java
> PRE-CREATION
>
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabInvestigationParser.java
> PRE-CREATION
>
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabStudyParser.java
> PRE-CREATION
>
> trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
> 1668683
>
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabAssayParserTest.java
> PRE-CREATION
>
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabInvestigationParserTest.java
> PRE-CREATION
>
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISATabStudyParserTest.java
> PRE-CREATION
>
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_bii-s-2_metabolite
> profiling_NMR spectroscopy.txt PRE-CREATION
>
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_metabolome.txt
> PRE-CREATION
>
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_microarray.txt
> PRE-CREATION
>
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_proteome.txt
> PRE-CREATION
>
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_transcriptome.txt
> PRE-CREATION
>
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/i_investigation.txt
> PRE-CREATION
>
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-1.txt
> PRE-CREATION
>
> trunk/tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-2.txt
> PRE-CREATION
>
> Diff: https://reviews.apache.org/r/32291/diff/
>
>
> Testing
> -------
>
> Tested on sample ISA-Tab files downloaded from
> http://www.isa-tools.org/format/examples/.
>
>
> Thanks,
>
> Giuseppe Totaro
>
>