I believe I have heard that Span queries provide some way to access
document offset information for their hits somehow. Does anyone know if
this is true, and if so, how I would go about it?
Alternatively (preferably actually) does the surround code from the SVN
development area have a way of
I believe I have heard that Span queries provide some way to access
document offset information for their hits somehow.
See http://marc.theaimsgroup.com/?l=lucene-userm=112496111224218w=2
Faithfully selecting extracts based *exactly* on query criteria will be
hard given complex queries eg
On Tuesday 06 September 2005 08:21, Sean O'Connor wrote:
I believe I have heard that Span queries provide some way to access
document offset information for their hits somehow. Does anyone know if
this is true, and if so, how I would go about it?
Alternatively (preferably actually) does
On Tuesday 06 September 2005 08:52, markharw00d wrote:
I believe I have heard that Span queries provide some way to access
document offset information for their hits somehow.
See http://marc.theaimsgroup.com/?l=lucene-userm=112496111224218w=2
Faithfully selecting extracts based *exactly*
As far as your usage is concerned, it seems to be the right approach,
and I think the StandardAnalyzer does the job pretty right when it has
to deal with whatever language you want.
Though, note that it won't deal with all languages' stop words but the
English ones, unless specified at index
Hi Chris,
Thank you for your info.
With CJKAnalyzer, the diagnosis are as follows:
pos start end
Inc OfstOfst
[Aa]1 0 2
[aa]1 1 3
[aB]1 2 4
[BC]1 3 5
[Cc]1 4 6
[cD]1 5 7
Try change TokenGroup.isDistinct();
Maybe the offset test code should be = rather than
ie
boolean isDistinct(Token token)
{
return token.startOffset()=endOffset;
}
I've just tried the change with the Junit test and all
seems well still with the non CJK
James Adams wrote:
Does anyone know what approach does Nutch uses?
-Original Message-
From: Hacking Bear [mailto:[EMAIL PROTECTED]
Sent: 06 September 2005 12:15
To: java-user@lucene.apache.org
Subject: Re: Multiple Language Indexing and Searching
On 9/6/05, Olivier Jaquemet [EMAIL
Gusenbauer Stefan wrote:
I think nutch uses ngramj for language classification but i don't know
what type of saving language information they use. In our application
for example i save the language in an extra field in the document
because lucene is supporting multiple fields with the same
Hi,
I find that unit tests that modify an existing record in the Lucene
index by removing it , modifying it and re-adding it, fails if I switch
from an FSDirectory to a RAMDirectory.
This code gives me a Directory that works:
FSDirectory fsDirectory =
See the contrib/spellchecker area of Lucene's Subversion repository.
Erik
On Sep 6, 2005, at 10:09 AM, Legolas Woodland wrote:
Hi
Thank you for reading mu post.
how i can have more than one spell check suggestion ?
for example if some one entered puore
it return :
pore
pour
pure
poor
poer
Hi Everyone,
I have a special scenario where I frequently want to insert duplicates
documents in the index. For example, I know that I want 400 copies of the
same document. (I use the docboost of something else so I can't just add one
document and set the docboost to 400).
I would like to
: Hi,
: I find that unit tests that modify an existing record in the Lucene
: index by removing it , modifying it and re-adding it, fails if I switch
: from an FSDirectory to a RAMDirectory.
Could you please post a full and complete unit test that demonstrates the
problem. Based on your
Hello (redirecting to java-user@),
If you want to have more control over scoring and dealing with hits,
use HitCollector. Then you can break out when you accumulate enough
results. Note that scores in HitCollector are not normalized as are
the one coming from IndexSearcher's search(...)
Thanks for the input. I am looking at the suggested links now. If I make
any progress I will return to see if any of my work would be appropriate
to contribute back.
Sean
Paul Elschot wrote:
On Tuesday 06 September 2005 08:52, markharw00d wrote:
I believe I have heard that Span queries
15 matches
Mail list logo