[ 
https://issues.apache.org/jira/browse/TIKA-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390246#comment-16390246
 ] 

Hudson commented on TIKA-2590:
------------------------------

SUCCESS: Integrated in Jenkins build Tika-trunk #1453 (See 
[https://builds.apache.org/job/Tika-trunk/1453/])
TIKA-2590: restore the client's ability to choose what Excel file (g.alekseev: 
[https://github.com/apache/tika/commit/c56c7c41a6c51e4cd4dac78b693bd883f1329264])
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.java
TIKA-2590 update Changes.txt (tallison: 
[https://github.com/apache/tika/commit/947334cbf40bc6efef1cb488749213724bedb171])
* (edit) CHANGES.txt


> ExcelExtractor: cannot choose listening to the selected records only
> --------------------------------------------------------------------
>
>                 Key: TIKA-2590
>                 URL: https://issues.apache.org/jira/browse/TIKA-2590
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.17
>            Reporter: Grigoriy Alekseev
>            Priority: Critical
>             Fix For: 1.18, 2.0.0
>
>
> The listenForAllRecords argument is being always reset to 'true', so the 
> 'else' branch is never reached. It may cause incorrect text extraction when 
> records with certain unsupported types (e.g. SharedFormula) are present in a 
> file.
> {code:java}
>         public void processFile(DirectoryNode root, boolean 
> listenForAllRecords)
>                 throws IOException, SAXException, TikaException {
>             // Set up listener and register the records we want to process
>             HSSFRequest hssfRequest = new HSSFRequest();
>             listenForAllRecords = true;
>             if (listenForAllRecords) {
>                 hssfRequest.addListenerForAllRecords(formatListener);
>             } else {
>                 hssfRequest.addListener(formatListener, BOFRecord.sid);
>                 hssfRequest.addListener(formatListener, EOFRecord.sid);
>                 hssfRequest.addListener(formatListener, 
> DateWindow1904Record.sid);
>                 hssfRequest.addListener(formatListener, CountryRecord.sid);
>                 hssfRequest.addListener(formatListener, BoundSheetRecord.sid);
>                 hssfRequest.addListener(formatListener, SSTRecord.sid);
>                 hssfRequest.addListener(formatListener, FormulaRecord.sid);
>                 hssfRequest.addListener(formatListener, LabelRecord.sid);
>                 hssfRequest.addListener(formatListener, LabelSSTRecord.sid);
>                 hssfRequest.addListener(formatListener, NumberRecord.sid);
>                 hssfRequest.addListener(formatListener, RKRecord.sid);
>                 hssfRequest.addListener(formatListener, StringRecord.sid);
>                 hssfRequest.addListener(formatListener, HyperlinkRecord.sid);
>                 hssfRequest.addListener(formatListener, TextObjectRecord.sid);
>                 hssfRequest.addListener(formatListener, SeriesTextRecord.sid);
>                 hssfRequest.addListener(formatListener, FormatRecord.sid);
>                 hssfRequest.addListener(formatListener, 
> ExtendedFormatRecord.sid);
>                 hssfRequest.addListener(formatListener, 
> DrawingGroupRecord.sid);
>                 if 
> (extractor.officeParserConfig.getIncludeHeadersAndFooters()) {
>                     hssfRequest.addListener(formatListener, HeaderRecord.sid);
>                     hssfRequest.addListener(formatListener, FooterRecord.sid);
>                 }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to