DbInputStream does not support mark()/reset() when exhausted.
-------------------------------------------------------------
Key: JCR-2576
URL: https://issues.apache.org/jira/browse/JCR-2576
Project: Jackrabbit Content Repository
Issue Type: Bug
Components: jackrabbit-core
Affects Versions: 2.0.0
Reporter: Julian Sedding
The DbDataStore implementation uses a DbInputStream to read binary properties
from the database. When a new binary property is created, Jackrabbit attempts
to index it. Tika's CharsetDetector is used in the process, which marks the
input stream, reads the first 8000 bytes and then resets the stream.
This results in the stacktrace shown at the end of the issue, if the following
two conditions hold true:
* the property is larger than the minRecordLength configuration of the
Datastore and
* the property is smaller than 8000 bytes
The DbInputStream needs to have the following properties:
1. lazy instantiation of the underlying stream
2. auto-close underlying stream when EOF is reached
3. fully support mark()/reset() even if the underlying stream is auto-closed
due to 2.
12.03.2010 15:53:28 *WARN * LazyTextExtractorField: Failed to extract text from
a binary property (LazyTextExtractorField.java, line 165)
java.io.EOFException
at
org.apache.jackrabbit.core.data.db.DbInputStream.reset(DbInputStream.java:180)
at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:156)
at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:156)
at
org.apache.tika.parser.txt.CharsetDetector.setText(CharsetDetector.java:131)
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:77)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114)
at
org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.