Parikshit Phukan created TIKA-3163:
--------------------------------------
Summary: Java null pointer exception thrown while parsing an xlsx
file to string even though the xlsx file is working fine in the wps
Key: TIKA-3163
URL: https://issues.apache.org/jira/browse/TIKA-3163
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.24.1
Reporter: Parikshit Phukan
Attachments: CVLKRA-KYC_Download_File_Structure_V3.1.xlsx
I am using tika to extract text and feed it to my lucene indexer. Tika is
throwing a null pointer exception for a particular xlsx file. It works fine
while testing on other xlsx file and only throws an exception on this
particular file. I'll be attaching the xlslx file for you to check out. Kindly
help me out.
Code :-
String path = "D:\\CVLKRA-KYC_Download_File_Structure_V3.1.xlsx";String path =
"D:\\CVLKRA-KYC_Download_File_Structure_V3.1.xlsx";
File file = new File(path);
System.out.print(tika.parseToString(file));
Error :-
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected
RuntimeException from
org.apache.tika.parser.microsoft.ooxml.OOXMLParser@54a67a45Exception in thread
"main" org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@54a67a45 at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at
org.apache.tika.Tika.parseToString(Tika.java:527) at
org.apache.tika.Tika.parseToString(Tika.java:642) at
poc.please.TikaPoc.main(TikaPoc.java:42)Caused by:
java.lang.NullPointerException at
org.apache.poi.xssf.usermodel.XSSFTableStyle.<init>(XSSFTableStyle.java:64) at
org.apache.poi.xssf.model.StylesTable.readFrom(StylesTable.java:245) at
org.apache.poi.xssf.model.StylesTable.<init>(StylesTable.java:138) at
org.apache.poi.xssf.eventusermodel.XSSFReader.getStylesTable(XSSFReader.java:127)
at
org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:143)
at
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:136)
at
org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.getXHTML(XSSFExcelExtractorDecorator.java:126)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:210)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:113)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 5
more
--
This message was sent by Atlassian Jira
(v8.3.4#803005)