This is the posted message link from the archive, I found it very useful. You also need to know how to add it to JackRabbit as a TextFilterService so that JackRabbit can find it at start up and then apply it to the right mimetype document. Don't forget your node needs to include the type nt:Resource.
Simon -----Original Message----- From: Miklos Pocsaji [mailto:[EMAIL PROTECTED] Sent: 16 June 2005 11:06 To: Simon Gash Subject: Re: Two problems Can you post it again, please? Thank you. Miklos Pocsaji. On 6/16/05, Simon Gash <[EMAIL PROTECTED]> wrote: > Not sure if you are interested but I'm using a PDF text filter class > provided by the Apache Slide team. Its available from the archives or > I can post it again if you want. > > Simon > > -----Original Message----- > From: Miklos Pocsaji [mailto:[EMAIL PROTECTED] > Sent: 16 June 2005 10:24 > To: [email protected] > Subject: Two problems > > Hi! > > Started working with Jackrabbit a month ago and I ran into two problems: > > 1. I saw a post here that the time-consuming startup is maintained by > somebody. Is there an improvement? Even if there are few hundred > megabytes of stored data, startup time (repository creation is really > slow) > > 2. I started writing a TextFilter which knows how to extract text from > PDF (I implemented the TextFilter interface). It is simple, I only > have to return a java.io.Reader from which Jackrabbit extracts text. > Obvious and ugly method would be to extract a text to a string and > then return a StringReader but this would require a lot of memory. I > decided to use PiperReader-PipedWriter - a separate thread writes the > text to a PipedWriter and I return the PipedReader instance from the > doFilter() method. It seems that Jackrabbit won't read through the > passed stream immediately. I see my writing thread to stop, then after > performing a search, it throws an exception that the other end of the > pipe is closed... > I do not know if my approach is correct, so if somebody could, please > inform me if this thing could work somehow. I'm thinking about > examining the source itself but if somebody could help me I can spare > a lot of time. > > Thank you in advance, > Miklos Pocsaji. > > Come visit us at: > > Internet World 2005. June 14 - 16, Earls Court, Stand # A60 > > Government Computing Expo. June 21 & 22, Earls Court, Stand # 804 > > SOCITM Annual Event. October 16 - 18 Brighton Hotel, Stand # 28 GOSS - > Ranked 4th in the Deloitte Technology Fast 50 Awards 2004 and 88th in the Deloitte Technology Fast 500 EMEA. > > This email contains proprietary information, some or all of which may be legally privileged. It is for the intended recipient only. If an addressing or transmission error has misdirected this email, please notify the author by replying to this email. If you are not the intended recipient you may not use, disclose, distribute, copy, print or rely on this email. > > > > Email transmission cannot be guaranteed to be secure or error free, as information may be intercepted, corrupted, lost, destroyed, arrive late or incomplete or contain viruses. This email and any files attached to it have been checked with virus detection software before transmission. You should nonetheless carry out your own virus check before opening any attachment. GOSS Interactive Ltd accepts no liability for any loss or damage that may be caused by software viruses. > > >
