Hits document offset information? Span query or Surround?

2005-09-06 Thread Sean O'Connor
I believe I have heard that Span queries provide some way to access document offset information for their hits somehow. Does anyone know if this is true, and if so, how I would go about it? Alternatively (preferably actually) does the surround code from the SVN development area have a way of

Re: Hits document offset information? Span query or Surround?

2005-09-06 Thread markharw00d
I believe I have heard that Span queries provide some way to access document offset information for their hits somehow. See http://marc.theaimsgroup.com/?l=lucene-userm=112496111224218w=2 Faithfully selecting extracts based *exactly* on query criteria will be hard given complex queries eg

Re: Hits document offset information? Span query or Surround?

2005-09-06 Thread Paul Elschot
On Tuesday 06 September 2005 08:21, Sean O'Connor wrote: I believe I have heard that Span queries provide some way to access document offset information for their hits somehow. Does anyone know if this is true, and if so, how I would go about it? Alternatively (preferably actually) does

Re: Hits document offset information? Span query or Surround?

2005-09-06 Thread Paul Elschot
On Tuesday 06 September 2005 08:52, markharw00d wrote: I believe I have heard that Span queries provide some way to access document offset information for their hits somehow. See http://marc.theaimsgroup.com/?l=lucene-userm=112496111224218w=2 Faithfully selecting extracts based *exactly*

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Olivier Jaquemet
As far as your usage is concerned, it seems to be the right approach, and I think the StandardAnalyzer does the job pretty right when it has to deal with whatever language you want. Though, note that it won't deal with all languages' stop words but the English ones, unless specified at index

RE: Highlighter apply to Japanese

2005-09-06 Thread Koji Sekiguchi
Hi Chris, Thank you for your info. With CJKAnalyzer, the diagnosis are as follows: pos start end Inc OfstOfst [Aa]1 0 2 [aa]1 1 3 [aB]1 2 4 [BC]1 3 5 [Cc]1 4 6 [cD]1 5 7

RE: Highlighter apply to Japanese

2005-09-06 Thread mark harwood
Try change TokenGroup.isDistinct(); Maybe the offset test code should be = rather than ie boolean isDistinct(Token token) { return token.startOffset()=endOffset; } I've just tried the change with the Junit test and all seems well still with the non CJK

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Gusenbauer Stefan
James Adams wrote: Does anyone know what approach does Nutch uses? -Original Message- From: Hacking Bear [mailto:[EMAIL PROTECTED] Sent: 06 September 2005 12:15 To: java-user@lucene.apache.org Subject: Re: Multiple Language Indexing and Searching On 9/6/05, Olivier Jaquemet [EMAIL

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Olivier Jaquemet
Gusenbauer Stefan wrote: I think nutch uses ngramj for language classification but i don't know what type of saving language information they use. In our application for example i save the language in an extra field in the document because lucene is supporting multiple fields with the same

Switching from FSDirectory to RAMDirectory

2005-09-06 Thread Peter Gelderbloem
Hi, I find that unit tests that modify an existing record in the Lucene index by removing it , modifying it and re-adding it, fails if I switch from an FSDirectory to a RAMDirectory. This code gives me a Directory that works: FSDirectory fsDirectory =

Re: how to Find more than one spell check alternative?

2005-09-06 Thread Erik Hatcher
See the contrib/spellchecker area of Lucene's Subversion repository. Erik On Sep 6, 2005, at 10:09 AM, Legolas Woodland wrote: Hi Thank you for reading mu post. how i can have more than one spell check suggestion ? for example if some one entered puore it return : pore pour pure poor poer

Optimizing insertion of duplicate documents

2005-09-06 Thread Robichaud, Jean-Philippe
Hi Everyone, I have a special scenario where I frequently want to insert duplicates documents in the index. For example, I know that I want 400 copies of the same document. (I use the docboost of something else so I can't just add one document and set the docboost to 400). I would like to

Re: Switching from FSDirectory to RAMDirectory

2005-09-06 Thread Chris Hostetter
: Hi, : I find that unit tests that modify an existing record in the Lucene : index by removing it , modifying it and re-adding it, fails if I switch : from an FSDirectory to a RAMDirectory. Could you please post a full and complete unit test that demonstrates the problem. Based on your

Re: limit return results

2005-09-06 Thread Otis Gospodnetic
Hello (redirecting to java-user@), If you want to have more control over scoring and dealing with hits, use HitCollector. Then you can break out when you accumulate enough results. Note that scores in HitCollector are not normalized as are the one coming from IndexSearcher's search(...)

Re: Hits document offset information? Span query or Surround? - thanks

2005-09-06 Thread Sean O'Connor
Thanks for the input. I am looking at the suggested links now. If I make any progress I will return to see if any of my work would be appropriate to contribute back. Sean Paul Elschot wrote: On Tuesday 06 September 2005 08:52, markharw00d wrote: I believe I have heard that Span queries