[ 
https://issues.apache.org/jira/browse/TIKA-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390107#comment-16390107
 ] 

ASF GitHub Bot commented on TIKA-2590:
--------------------------------------

tballison closed pull request #225: TIKA-2590: restore the client's ability to 
choose what Excel file rec…
URL: https://github.com/apache/tika/pull/225
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.java
 
b/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.java
index 9146b8c7b..4ea8068de 100644
--- 
a/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.java
+++ 
b/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.java
@@ -284,7 +284,6 @@ public void processFile(DirectoryNode root, boolean 
listenForAllRecords)
 
             // Set up listener and register the records we want to process
             HSSFRequest hssfRequest = new HSSFRequest();
-            listenForAllRecords = true;
             if (listenForAllRecords) {
                 hssfRequest.addListenerForAllRecords(formatListener);
             } else {


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ExcelExtractor: cannot choose listening to the selected records only
> --------------------------------------------------------------------
>
>                 Key: TIKA-2590
>                 URL: https://issues.apache.org/jira/browse/TIKA-2590
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.17
>            Reporter: Grigoriy Alekseev
>            Priority: Critical
>             Fix For: 2.0.0
>
>
> The listenForAllRecords argument is being always reset to 'true', so the 
> 'else' branch is never reached. It may cause incorrect text extraction when 
> records with certain unsupported types (e.g. SharedFormula) are present in a 
> file.
> {code:java}
>         public void processFile(DirectoryNode root, boolean 
> listenForAllRecords)
>                 throws IOException, SAXException, TikaException {
>             // Set up listener and register the records we want to process
>             HSSFRequest hssfRequest = new HSSFRequest();
>             listenForAllRecords = true;
>             if (listenForAllRecords) {
>                 hssfRequest.addListenerForAllRecords(formatListener);
>             } else {
>                 hssfRequest.addListener(formatListener, BOFRecord.sid);
>                 hssfRequest.addListener(formatListener, EOFRecord.sid);
>                 hssfRequest.addListener(formatListener, 
> DateWindow1904Record.sid);
>                 hssfRequest.addListener(formatListener, CountryRecord.sid);
>                 hssfRequest.addListener(formatListener, BoundSheetRecord.sid);
>                 hssfRequest.addListener(formatListener, SSTRecord.sid);
>                 hssfRequest.addListener(formatListener, FormulaRecord.sid);
>                 hssfRequest.addListener(formatListener, LabelRecord.sid);
>                 hssfRequest.addListener(formatListener, LabelSSTRecord.sid);
>                 hssfRequest.addListener(formatListener, NumberRecord.sid);
>                 hssfRequest.addListener(formatListener, RKRecord.sid);
>                 hssfRequest.addListener(formatListener, StringRecord.sid);
>                 hssfRequest.addListener(formatListener, HyperlinkRecord.sid);
>                 hssfRequest.addListener(formatListener, TextObjectRecord.sid);
>                 hssfRequest.addListener(formatListener, SeriesTextRecord.sid);
>                 hssfRequest.addListener(formatListener, FormatRecord.sid);
>                 hssfRequest.addListener(formatListener, 
> ExtendedFormatRecord.sid);
>                 hssfRequest.addListener(formatListener, 
> DrawingGroupRecord.sid);
>                 if 
> (extractor.officeParserConfig.getIncludeHeadersAndFooters()) {
>                     hssfRequest.addListener(formatListener, HeaderRecord.sid);
>                     hssfRequest.addListener(formatListener, FooterRecord.sid);
>                 }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to