BufferedInputStream.getInIfOpen() - null inputStream
-----------------------------------------------------
Key: TIKA-607
URL: https://issues.apache.org/jira/browse/TIKA-607
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 0.9
Environment: java version "1.6.0_16", linux 64bit
Reporter: Joseph Vychtrle
Hey, I was trying Tika with 4 different documents and reading the inputStream
always ends up as you can see in the logs. Also reading content of a text file
my.cnf failed.
{code:title=TikaTest.java|borderStyle=solid}
package cz.instance.transl.tests;
import java.io.File;
import java.io.InputStream;
import org.apache.tika.config.TikaConfig;
import org.apache.tika.detect.Detector;
import org.apache.tika.detect.TypeDetector;
import org.apache.tika.language.LanguageIdentifier;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.mime.MediaType;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
import org.apache.tika.parser.pdf.PDFParser;
import org.apache.tika.sax.BodyContentHandler;
import org.apache.tika.utils.ParseUtils;
import org.testng.annotations.Test;
import org.xml.sax.ContentHandler;
public class TikaTest {
@Test
public void testPDFParser() throws Exception {
String resourceLocation =
"file/Designandrealizationofanintranetportal.pdf";
InputStream input =
this.getClass().getClassLoader().getResourceAsStream(resourceLocation);
ContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
PDFParser parser = new PDFParser();
parser.parse(input, textHandler, metadata, new ParseContext());
input.close();
System.out.println("Title: " + metadata.get("title"));
System.out.println("Author: " + metadata.get("Author"));
System.out.println("format: " + metadata.get("source"));
System.out.println("content: " + textHandler.toString());
}
@Test
public void testAutoDetectParser() throws Exception {
InputStream input =
this.getClass().getResourceAsStream("file/jedna.odt");
ContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
parser.parse(input, textHandler, metadata, new ParseContext());
input.close();
System.out.println("Title: " + metadata.get("title"));
System.out.println("Author: " + metadata.get("Author"));
}
@Test
public void testTikaParserUtils() throws Exception {
String resourceLocation = "my.cnf";
String content = ParseUtils.getStringContent(new
File(resourceLocation), new TikaConfig());
System.out.println(content);
}
@Test
public void testTypeDetector() throws Exception {
String resourceLocation =
"file/Pozadavky_pro_predkladani_diplomovych_praci.doc";
InputStream input =
this.getClass().getClassLoader().getResourceAsStream(resourceLocation);
Detector detector = new TypeDetector();
MediaType media = detector.detect(input, new Metadata());
System.out.println("Extact Type: " + media.getType());
System.out.println("Sub Type: " + media.getBaseType());
}
@Test
public void testLanguageIdentifier() throws Exception {
String resourceLocation = "file/moje.pdf";
InputStream input =
this.getClass().getClassLoader().getResourceAsStream(resourceLocation);
ContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
parser.parse(input, textHandler, metadata, new ParseContext());
input.close();
LanguageIdentifier languageIdentifier = new
LanguageIdentifier(textHandler.toString());
System.out.println("found language :" +
languageIdentifier.getLanguage() + " certainity : "
+ languageIdentifier.isReasonablyCertain());
}
}
}
{code}
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running TestSuite
[TestRunner] Running the tests in 'DomainObjectFactoryTests' with parallel
mode:false
[RunInfo] Adding method selector:
org.testng.internal.XmlMethodSelector@46e45076 priority: 10
[TestClass] Creating TestClass for [ClassImpl cz.instance.transl.tests.TikaTest]
[TestClass] Adding method cz.instance.transl.tests.TikaTest.testTypeDetector()
on TestClass class cz.instance.transl.tests.TikaTest
[TestClass] Adding method
cz.instance.transl.tests.TikaTest.testLanguageIdentifier() on TestClass class
cz.instance.transl.tests.TikaTest
[TestClass] Adding method
cz.instance.transl.tests.TikaTest.testAutoDetectParser() on TestClass class
cz.instance.transl.tests.TikaTest
[TestClass] Adding method
cz.instance.transl.tests.TikaTest.testTikaParserUtils() on TestClass class
cz.instance.transl.tests.TikaTest
[TestClass] Adding method cz.instance.transl.tests.TikaTest.testPDFParser() on
TestClass class cz.instance.transl.tests.TikaTest
[XmlMethodSelector] Including method cz.instance.transl.tests.testTypeDetector()
[XmlMethodSelector] Including method
cz.instance.transl.tests.testLanguageIdentifier()
[XmlMethodSelector] Including method
cz.instance.transl.tests.testAutoDetectParser()
[XmlMethodSelector] Including method
cz.instance.transl.tests.testTikaParserUtils()
[XmlMethodSelector] Including method cz.instance.transl.tests.testPDFParser()
[SuiteRunner] Created 1 TestRunners
[TestRunner] Running test DomainObjectFactoryTests on 1 classes, included
groups:[] excluded groups:[]
[TestClass]
======
TESTCLASS: cz.instance.transl.tests.TikaTest
[TestClass] Test :
cz.instance.transl.tests.TikaTest.testTypeDetector()
[TestClass] Test :
cz.instance.transl.tests.TikaTest.testLanguageIdentifier()
[TestClass] Test :
cz.instance.transl.tests.TikaTest.testAutoDetectParser()
[TestClass] Test :
cz.instance.transl.tests.TikaTest.testTikaParserUtils()
[TestClass] Test :
cz.instance.transl.tests.TikaTest.testPDFParser()
[TestClass]
======
[TestRunner] Found 5 applicable methods
[TestRunner] WILL BE RUN IN RANDOM ORDER:
[TestRunner] cz.instance.transl.tests.TikaTest.testAutoDetectParser()
[TestRunner] on instances
[TestRunner] cz.instance.transl.tests.TikaTest@1d3c468a
[TestRunner] cz.instance.transl.tests.TikaTest.testPDFParser()
[TestRunner] on instances
[TestRunner] cz.instance.transl.tests.TikaTest@1d3c468a
[TestRunner] cz.instance.transl.tests.TikaTest.testTikaParserUtils()
[TestRunner] on instances
[TestRunner] cz.instance.transl.tests.TikaTest@1d3c468a
[TestRunner] cz.instance.transl.tests.TikaTest.testTypeDetector()
[TestRunner] on instances
[TestRunner] cz.instance.transl.tests.TikaTest@1d3c468a
[TestRunner] cz.instance.transl.tests.TikaTest.testLanguageIdentifier()
[TestRunner] on instances
[TestRunner] cz.instance.transl.tests.TikaTest@1d3c468a
[TestRunner] ===
[Invoker 374961130] Invoking
cz.instance.transl.tests.TikaTest.testAutoDetectParser
[Invoker 374961130] Invoking cz.instance.transl.tests.TikaTest.testPDFParser
[Invoker 374961130] Invoking
cz.instance.transl.tests.TikaTest.testTikaParserUtils
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage
[Invoker 374961130] Invoking cz.instance.transl.tests.TikaTest.testTypeDetector
Extact Type: application
Sub Type: application/octet-stream
[Invoker 374961130] Invoking
cz.instance.transl.tests.TikaTest.testLanguageIdentifier
*********** INVOKED METHODS
cz.instance.transl.tests.TikaTest.testAutoDetectParser()
490489482
cz.instance.transl.tests.TikaTest.testPDFParser() 490489482
cz.instance.transl.tests.TikaTest.testTikaParserUtils()
490489482
cz.instance.transl.tests.TikaTest.testTypeDetector() 490489482
cz.instance.transl.tests.TikaTest.testLanguageIdentifier()
490489482
***********
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/DomainObjectFactoryTests.html
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/DomainObjectFactoryTests.xml
PASSED: testTypeDetector
FAILED: testAutoDetectParser
java.io.IOException: Stream closed
at java.io.BufferedInputStream.getInIfOpen(BufferedInputStream.java:134)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.FilterInputStream.read(FilterInputStream.java:90)
at org.apache.tika.mime.MimeTypes.readMagicHeader(MimeTypes.java:303)
at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:548)
at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:60)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:126)
at
cz.instance.transl.tests.TikaTest.testAutoDetectParser(TikaTest.java:44)
at
org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:73)
at
org.apache.maven.surefire.testng.TestNGXmlTestSuite.execute(TestNGXmlTestSuite.java:95)
at
org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:101)
at
org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:101)
at $Proxy0.invoke(Unknown Source)
at
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:139)
at
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:82)
at
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:81)
... Removed 24 stack frames
FAILED: testPDFParser
java.io.IOException: Stream closed
at java.io.BufferedInputStream.getInIfOpen(BufferedInputStream.java:134)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at java.io.PushbackInputStream.read(PushbackInputStream.java:122)
at
org.apache.pdfbox.io.PushBackInputStream.read(PushBackInputStream.java:84)
at
org.apache.pdfbox.io.PushBackInputStream.peek(PushBackInputStream.java:62)
at
org.apache.pdfbox.io.PushBackInputStream.isEOF(PushBackInputStream.java:150)
at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1248)
at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:283)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:155)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:881)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:846)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:74)
at cz.instance.transl.tests.TikaTest.testPDFParser(TikaTest.java:30)
at
org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:73)
at
org.apache.maven.surefire.testng.TestNGXmlTestSuite.execute(TestNGXmlTestSuite.java:95)
at
org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:101)
at
org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:101)
at $Proxy0.invoke(Unknown Source)
at
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:139)
at
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:82)
at
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:81)
... Removed 24 stack frames
FAILED: testTikaParserUtils
java.lang.NullPointerException
at
org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:112)
at
org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:171)
at
org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:189)
at
cz.instance.transl.tests.TikaTest.testTikaParserUtils(TikaTest.java:54)
at
org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:73)
at
org.apache.maven.surefire.testng.TestNGXmlTestSuite.execute(TestNGXmlTestSuite.java:95)
at
org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:101)
at
org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:101)
at $Proxy0.invoke(Unknown Source)
at
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:139)
at
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:82)
at
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:81)
... Removed 24 stack frames
FAILED: testLanguageIdentifier
java.io.IOException: Stream closed
at java.io.BufferedInputStream.getInIfOpen(BufferedInputStream.java:134)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.FilterInputStream.read(FilterInputStream.java:90)
at org.apache.tika.mime.MimeTypes.readMagicHeader(MimeTypes.java:303)
at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:548)
at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:60)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:126)
at
cz.instance.transl.tests.TikaTest.testLanguageIdentifier(TikaTest.java:75)
at
org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:73)
at
org.apache.maven.surefire.testng.TestNGXmlTestSuite.execute(TestNGXmlTestSuite.java:95)
at
org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:101)
at
org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:101)
at $Proxy0.invoke(Unknown Source)
at
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:139)
at
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:82)
at
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:81)
... Removed 24 stack frames
===============================================
DomainObjectFactoryTests
Tests run: 5, Failures: 4, Skips: 0
===============================================
===============================================
domain
Total tests run: 5, Failures: 4, Skips: 0
===============================================
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/toc.html
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/DomainObjectFactoryTests.properties
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/index.html
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/main.html
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/groups.html
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/methods.html
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/methods-alphabetical.html
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/classes.html
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/reporter-output.html
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/methods-not-run.html
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/testng.xml.html
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/index.html
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/testng-failed.xml
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/domain/testng-failed.xml
Creating
/opt/liferay/liferay-new/portal/plugins-trunk/portlets/brokerage/target/surefire-reports/testng-results.xml
Tests run: 5, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 1.724 sec <<<
FAILURE!
Results :
Failed tests:
testAutoDetectParser(cz.instance.transl.tests.TikaTest)
testPDFParser(cz.instance.transl.tests.TikaTest)
testTikaParserUtils(cz.instance.transl.tests.TikaTest)
testLanguageIdentifier(cz.instance.transl.tests.TikaTest)
Tests run: 5, Failures: 4, Errors: 0, Skipped: 0
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira