Miklos Pocsaji wrote:
2. I started writing a TextFilter which knows how to extract text from PDF (I implemented the TextFilter interface). It is simple, I only have to return a java.io.Reader from which Jackrabbit extracts text. Obvious and ugly method would be to extract a text to a string and then return a StringReader but this would require a lot of memory. I decided to use PiperReader-PipedWriter - a separate thread writes the text to a PipedWriter and I return the PipedReader instance from the doFilter() method. It seems that Jackrabbit won't read through the passed stream immediately.
Indexing of nodes is buffered in jackrabbit. this may mean that nodes are not added to the index until a query is issued.
As far as I can see you have to make sure that the PipedWriter is not closed until the PipedReader is closed.
regards marcel
