Hi Uwe, I was suggesting writing a custom tokenizer. In the worst case it would be a character per token, might not be a very pretty solution, but should do the job. What do you think?
Thanks Shashi On Fri, Jan 30, 2009 at 12:57 PM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi Shashi, > > What is the sense of this? The base64 encoded documents cannot be tokenized > and searched. To do this, they must be indexed as plain text. If you want > to > store the original binary values as document data in the index, you could > also store them additionally as byte[] in the raw biary form in the index. > You must differentiate between *indexed* and *stored* fields. > > But as Paul said, just *index* the text parts from the binary file using a > parser and also *store* the offset value to get a pointer to the original > data. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -----Original Message----- > > From: Shashi Kant [mailto:shashi_k...@yahoo.com] > > Sent: Friday, January 30, 2009 3:32 PM > > To: java-user@lucene.apache.org > > Subject: Re: indexing binary files? > > > > Hi Paul, have you tried persisting the binaries in Base64 format and then > > indexing them? > > As you are aware, Base64 is a robust representation used in email > > attachments for example. > > > > > > Thanks > > Shashi > > > > > > > > ----- Original Message ---- > > From: Paul Feuer <paul...@gmail.com> > > To: java-user@lucene.apache.org > > Sent: Thursday, January 29, 2009 10:43:36 PM > > Subject: indexing binary files? > > > > Hi - > > > > I've looked on the FAQ, the Java Docs, and searched a little in > > google, but haven't been able to figure out if Lucene can index binary > > files. > > > > Our binary files can get up into the 20-30 gigabyte range. > > > > If it is possible, anyone have any pointers to what interfaces I should > > look at? > > > > Thanks, > > > > ./paul > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > >