Mark Miller wrote:
The contrib Highlighter doesn't know and highlights them all.
Check out my patch here for position sensitive highlighting:
https://issues.apache.org/jira/browse/LUCENE-794
It seems that the patch does not work with Lucene 2.2 as I get some
compile errors. Is this really
Here are more details about my issue.
I have two tables in database. A row in table 1 can have multiple rows
associated with it in table 2. It is a one to many mapping.
Let's say a row in table 1 is A and it has multiple rows B1, B2 and B3
associated with it in table 2. I need to search on both
Oh yeah...something that you may not have seen is that this has a
dependency on MemoryIndex from contrib. You need that jar as well.
- Mark
Marjan Celikik wrote:
Mark Miller wrote:
The contrib Highlighter doesn't know and highlights them all.
Check out my patch here for position sensitive
It should work no problem with 2.2. What are the compile errors you are
getting?
If you send me a note directly I will send you a jar.
- Mark
Marjan Celikik wrote:
Mark Miller wrote:
The contrib Highlighter doesn't know and highlights them all.
Check out my patch here for position
Thanks all you for yours answers, I going to change a few things in my
application and make tests.
One thing I haven't find another good pdfToText converter like pdfBox Do you
know any other faster ?
Greetings
Thanks for yours answers
Ariel
On Jan 9, 2008 11:08 PM, Otis Gospodnetic [EMAIL
The Highlighter works by comparing the TokenStream of the document with
the Tokens in the query. The TokenStream can be rebuilt from the index
if you use TermVectors with TokenSources or you can get it by
reanalyzing the document. Each Token from the TokenStream is checked
against Tokens in
Mark Miller wrote:
Oh yeah...something that you may not have seen is that this has a
dependency on MemoryIndex from contrib. You need that jar as well.
- Mark
Hm, I need the source code. How do I download the files from
https://issues.apache.org/jira/browse/LUCENE-794 (all I see are some
Sachin,
As the merging of the results is the issue, I'll assume that you don't
have clear user requirements for that.
The simplest way out of that is to allow the users to search
the B's first, and once they have determined which B's they'd like
to use, use those B's to limit the results in of
Mark Miller wrote:
The Highlighter works by comparing the TokenStream of the document
with the Tokens in the query. The TokenStream can be rebuilt from the
index if you use TermVectors with TokenSources or you can get it by
reanalyzing the document. Each Token from the TokenStream is checked
In a distributed enviroment the application should make an exhaustive use of
the network and there is not another way to access to the documents in a
remote repository but accessing in nfs file system.
One thing I must clarify: I index the documents in memory, I use
RAMDirectory to do that, then
Thanks for the post. So you're using the doc id as the key into the
cache to retrieve the external id. Then what mechanism fetches the
external id's from the searcher and places them in the cache?
-Original Message-
From: Antony Bowesman [mailto:[EMAIL PROTECTED]
Sent: Wednesday,
Marjan Celikik wrote:
Mark Miller wrote:
The Highlighter works by comparing the TokenStream of the document
with the Tokens in the query. The TokenStream can be rebuilt from the
index if you use TermVectors with TokenSources or you can get it by
reanalyzing the document. Each Token from the
This seems really clunky. Especially if your merge step also optimizes.
There's not much point in indexing into RAM then merging explicitly.
Just use an FSDirectory rather than a RAMDirectory. There is *already*
buffering built in to FSDirectory, and your merge factor etc. control
how much RAM is
Mark Miller wrote:
That is why the original contrib does not work with PhraseQuery's. It
simply matches Tokens from the query with those in the TokenStream.
LUCENE-794 takes the TokenStream and shoves it into a MemoryIndex.
Then, after converting the query to a SpanQuery approximation,
I don't think you would see much of gain. Shoving the TokenStream into
the MemoryIndex is actually pretty fast and I wouldn't be surprised if
it was much faster than reading from disk. Most of the computational
time is spent in reconstructing the TokenStream, whether you use
term-vectors or
If possible you should also test the soon-to-be-released version 2.3,
which has a number of speedups to indexing.
Also try the steps here:
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
You should also try an A/B test: A) writing your index to the NFS
directory and then B) to
Ok, I've been thinking about this some more. Is the cache mechanism
pulling from the cache if the external id already exists there and then
hitting the searcher if it's not already in the cache (maybe using a
FieldSelector for just retrieving the external id)?
-Original Message-
From:
I'm sure this has been asked a few times before, but i searched and searched
and found no answer (apart from using luke), but I would like to know if
there's a way of retrieving the number of terms in an index.
I tried cycling through a TermEnum, but i doesn't do anything :|
--
View this message
I am indexing into RAM then merging explicitly because my application demand
it due to I have design it as a distributed enviroment so many threads or
workers are in different machines indexing into RAM serialize to disk an
another thread in another machine access the segment index to merge it
Hi Chris,
by number of terms, do you mean the number of different terms that
compose the index, or the numers of total terms, including repetitions?
chris.b escribió:
I'm sure this has been asked a few times before, but i searched and searched
and found no answer (apart from using luke),
Ariel,
Comments inline.
- Original Message
From: Ariel [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Thursday, January 10, 2008 10:05:28 AM
Subject: Re: Why is lucene so slow indexing in nfs file system ?
In a distributed enviroment the application should make an exhaustive
Beard, Brian wrote:
Ok, I've been thinking about this some more. Is the cache mechanism
pulling from the cache if the external id already exists there and then
hitting the searcher if it's not already in the cache (maybe using a
FieldSelector for just retrieving the external id)?
I am warming
Thanks for yours suggestions.
I'm sorry I didn't know but I would want to know what Do you mean with SAN
and FC?
Another thing, I have visited the lucene home page and there is not released
the 2.3 version, could you tell me where is the download link ?
Thanks in advance.
Ariel
On Jan 10, 2008
SAN is Storage Area Network. FC is fiber channel.
I can confirm by one customer experience that using SAN does scale
pretty well, and pretty simple. Well, it costs some money.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site:
2.3 is in the process of being released. Give it another week to 10 days and
it will be out.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Ariel [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Thursday, January 10, 2008 6:26:44 PM
25 matches
Mail list logo