[ https://issues.apache.org/jira/browse/TIKA-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390246#comment-16390246 ]
Hudson commented on TIKA-2590: ------------------------------ SUCCESS: Integrated in Jenkins build Tika-trunk #1453 (See [https://builds.apache.org/job/Tika-trunk/1453/]) TIKA-2590: restore the client's ability to choose what Excel file (g.alekseev: [https://github.com/apache/tika/commit/c56c7c41a6c51e4cd4dac78b693bd883f1329264]) * (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.java TIKA-2590 update Changes.txt (tallison: [https://github.com/apache/tika/commit/947334cbf40bc6efef1cb488749213724bedb171]) * (edit) CHANGES.txt > ExcelExtractor: cannot choose listening to the selected records only > -------------------------------------------------------------------- > > Key: TIKA-2590 > URL: https://issues.apache.org/jira/browse/TIKA-2590 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.17 > Reporter: Grigoriy Alekseev > Priority: Critical > Fix For: 1.18, 2.0.0 > > > The listenForAllRecords argument is being always reset to 'true', so the > 'else' branch is never reached. It may cause incorrect text extraction when > records with certain unsupported types (e.g. SharedFormula) are present in a > file. > {code:java} > public void processFile(DirectoryNode root, boolean > listenForAllRecords) > throws IOException, SAXException, TikaException { > // Set up listener and register the records we want to process > HSSFRequest hssfRequest = new HSSFRequest(); > listenForAllRecords = true; > if (listenForAllRecords) { > hssfRequest.addListenerForAllRecords(formatListener); > } else { > hssfRequest.addListener(formatListener, BOFRecord.sid); > hssfRequest.addListener(formatListener, EOFRecord.sid); > hssfRequest.addListener(formatListener, > DateWindow1904Record.sid); > hssfRequest.addListener(formatListener, CountryRecord.sid); > hssfRequest.addListener(formatListener, BoundSheetRecord.sid); > hssfRequest.addListener(formatListener, SSTRecord.sid); > hssfRequest.addListener(formatListener, FormulaRecord.sid); > hssfRequest.addListener(formatListener, LabelRecord.sid); > hssfRequest.addListener(formatListener, LabelSSTRecord.sid); > hssfRequest.addListener(formatListener, NumberRecord.sid); > hssfRequest.addListener(formatListener, RKRecord.sid); > hssfRequest.addListener(formatListener, StringRecord.sid); > hssfRequest.addListener(formatListener, HyperlinkRecord.sid); > hssfRequest.addListener(formatListener, TextObjectRecord.sid); > hssfRequest.addListener(formatListener, SeriesTextRecord.sid); > hssfRequest.addListener(formatListener, FormatRecord.sid); > hssfRequest.addListener(formatListener, > ExtendedFormatRecord.sid); > hssfRequest.addListener(formatListener, > DrawingGroupRecord.sid); > if > (extractor.officeParserConfig.getIncludeHeadersAndFooters()) { > hssfRequest.addListener(formatListener, HeaderRecord.sid); > hssfRequest.addListener(formatListener, FooterRecord.sid); > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)