Well, it seems that this may be a solution for me too.
But I'm afraid that someone one day will change this string. And then my app
will not work anymore...  

> -----Original Message-----
> From: Adrian Smith [mailto:[EMAIL PROTECTED] 
> Sent: Freitag, 15. Februar 2008 13:02
> To: java-user@lucene.apache.org
> Subject: Re: Design questions
> 
> Hi,
> 
> I have a similar sitaution. I also considered using $. But 
> for the sake of
> not running into (potential) problems with Tokenisers, I just 
> defined a
> string in a config file which for sure is never going to 
> occur in a document
> and will never be searched for, e.g.
> 
> dfgjkjrkruigduhfkdgjrugr
> 
> Cheers, Adrian
> --
> Java Software Developer
> http://www.databasesandlife.com/
> 
> 
> 
> On 15/02/2008, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> >
> >
> > I haven't really been following this thread that closely, but...
> >
> > : Why not just use $$$$$$$$? Check to insure that it makes
> >
> > : it through whatever analyzer you choose though. For instance,
> > : LetterTokenizer will remove it...
> >
> >
> > 1) i'm 99% sure you can do something like this...
> >
> >   Document doc = new Document()
> >   for (int i = 0; i < pages.length; i++) {
> >     doc.add(new Field("text", pages[i], Field.Store.NO,
> > Field.Index.TOKENIZED));
> >     doc.add(new Field("text", "$$", Field.Store.NO,
> > Field.Index.UN_TOKENIZED));
> >   }
> >
> > ...and you'll get your magic token regardless of whether it 
> would normally
> > make it through your analyzer. In fact: you want it to be 
> something your
> > analyzer could never produce, even if it appears in the 
> orriginal text, so
> > you don't get false boundaries (ie: if you use an Analzeer 
> that lowercases
> > everything, then "A" makes a perfectly fine boundary token.
> >
> > 2) if your goal is just to be able to make sure you can 
> query for phrases
> > without crossing page boundaries, it's a lot simpler just to use are
> > really big positionIncimentGap with your analyzer (and add 
> each page as a
> > seperate Field instance).  boundary tokens like these are 
> relaly only
> > neccessary if you want more complex queries (like "find X and Y on
> > the same page but not in the same sentence")
> >
> >
> >
> >
> > -Hoss
> >
> >
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to