Hi Uwe, I was suggesting writing a custom tokenizer. In the worst case it
would be a character per token, might not be a very pretty solution, but
should do the job.
What do you think?

Thanks
Shashi


On Fri, Jan 30, 2009 at 12:57 PM, Uwe Schindler <u...@thetaphi.de> wrote:

> Hi Shashi,
>
> What is the sense of this? The base64 encoded documents cannot be tokenized
> and searched. To do this, they must be indexed as plain text. If you want
> to
> store the original binary values as document data in the index, you could
> also store them additionally as byte[] in the raw biary form in the index.
> You must differentiate between *indexed* and *stored* fields.
>
> But as Paul said, just *index* the text parts from the binary file using a
> parser and also *store* the offset value to get a pointer to the original
> data.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -----Original Message-----
> > From: Shashi Kant [mailto:shashi_k...@yahoo.com]
> > Sent: Friday, January 30, 2009 3:32 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: indexing binary files?
> >
> > Hi Paul, have you tried persisting the binaries in Base64 format and then
> > indexing them?
> > As you are aware, Base64 is a robust representation used in email
> > attachments for example.
> >
> >
> > Thanks
> > Shashi
> >
> >
> >
> > ----- Original Message ----
> > From: Paul Feuer <paul...@gmail.com>
> > To: java-user@lucene.apache.org
> > Sent: Thursday, January 29, 2009 10:43:36 PM
> > Subject: indexing binary files?
> >
> > Hi -
> >
> > I've looked on the FAQ, the Java Docs, and searched a little in
> > google, but haven't been able to figure out if Lucene can index binary
> > files.
> >
> > Our binary files can get up into the 20-30 gigabyte range.
> >
> > If it is possible, anyone have any pointers to what interfaces I should
> > look at?
> >
> > Thanks,
> >
> > ./paul
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>

Reply via email to