Hi All,
I have to develop a protoype of a search/indexation system with the
following characteristics,
1) High volume of data indexation but only with add and delete
functionality (approximatively 10 PDF) = scalable architecture HDFS
seems good.
2) Specific analysis chain and a given set of
Hi All,
I got a strange problem during the indexer process running on Redhat ES4
Linux machine ..
java.io.FileNotFoundException: /u01/export/index/books/_2s.fnm (No such
file or directory)
at java.io.RandomAccessFile.open(Native Method)
at
Hi all,
I have a question about memory/fileio settings and the FSDirectory.
The setMaxBufferedDocs and related parameters help a lot already to fully
exploit my RAM when indexing, but since I'm running a fairly small index of
around 4 docs and I'm optimizing it relatively often, I was
Hi,
I am using lucene 1.4.3. Some of my fields are indexed as Keywords. I
also have subclassed Analyzer inorder to put stemming etc. I am not sure
if the input is tokenized when I am searching on keyword fields; I don't
want it to be. Do I need to have a special case in the overridden method
Venu,
I presume you're asking about what Analyzer to use with QueryParser.
QueryParser analyzes all term text, but you can fake it for Keyword
(non-tokenized) fields by using PerFieldAnalyzerWrapper, specifying
the KeywordAnalyzer for the fields you indexed as such.
The KeywordAnalyzer
You understood me right, Erik. Your solution is working well, thanks.
Venu
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 05, 2006 6:03 PM
To: java-user@lucene.apache.org
Subject: Re: Which Analyzer to use when searching on Keyword fields
Venu,
Hi,
I have a large collection of text documents that I want to search
using lucene. Is there any command line utility that will allow me to
search this static collection of documents?
Writing one is an option but I want to know if anyone has already done this.
Thanks in advance,
Delip
Red Piranha: http://red-piranha.sourceforge.net/
-Original Message-
From: Delip Rao [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 05, 2006 6:53 PM
To: java-user@lucene.apache.org
Subject: searching offline
Hi,
I have a large collection of text documents that I want to search
using
http://regain.sourceforge.net/ ?
- Original Message -
From: Delip Rao [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Wednesday, April 05, 2006 2:23 PM
Subject: searching offline
Hi,
I have a large collection of text documents that I want to search
using lucene. Is there any
On 4/5/06, Artem Vasiliev [EMAIL PROTECTED] wrote:
The int[] array here contains references to String[] and to populate
it still all the field values need to be loaded and compared/sorted
Terms are stored and iterated in sorted order, so no sorting needs to be done.
It's still the case that all
Hi.
Is it correct that in Release 1.9.1 a WRITE_LOCK_TIMEOUT is hardcoded
and there is no way to set it from outside?
I've seen a check-in in the CVS from a few days ago which added
getters/setters for this, but ... there is no release containing
this, right?
So, my question is: Is it
On 4/5/06, Bruno Grilheres [EMAIL PROTECTED] wrote:
1) High volume of data indexation but only with add and delete
functionality (approximatively 10 PDF) = scalable architecture HDFS
seems good.
2) Specific analysis chain and a given set of meta-data indexation.
3) Language Recognition
4) No
Hi.
Is it correct that in Release 1.9.1 a WRITE_LOCK_TIMEOUT is hardcoded
and there is no way to set it from outside?
I've seen a check-in in the CVS from a few days ago which added
getters/setters for this, but ... there is no release containing
this, right?
So, my question is:
On 05.04.2006, at 17:15 Uhr, Bill Janssen wrote:
Or, as I suggested a couple of days ago, a 1.9.2 release could be
offered.
Would be a good idea, because the current nightly builds have a lot
of deprecated methods removed which where available in 1.9.1.
Lot of work just for this ... :-(
I'm using Lucene 1.9.1, and I'm seeing some odd behavior that I hope
someone can help me with.
My application counts on Lucene maintaining the order of the documents
exactly the same as how I insert them. Lucene is supposed to maintain
document order, even across index merges, correct?
My
Hi,
I need to change the lucene sorting to give just a bit more relevance to
the recent documents (but i don't want to sort by date). I'd like to mix
the lucene score with the date of the document.
I'm following the example in Lucene in Action, chapter 6. I'm trying
to extends the
Thanks for your answer, I was not aware of the SOLR project,
There was a big typo here, I meant less than 10 Go of PDF files per day
during one month = i.e. less than 300 Go of PDF files.
I made some tests with PDF files, 100Mo or Native PDF are converted to
3Mo of index in lucene [The text
On 4/5/06, Bruno Grilheres [EMAIL PROTECTED] wrote:
Thanks for your answer, I was not aware of the SOLR project,
There was a big typo here, I meant less than 10 Go of PDF files per day
during one month = i.e. less than 300 Go of PDF files.
Sorry, I'm not sure what the Go abbreviation is... I
: exactly the same as how I insert them. Lucene is supposed to maintain
: document order, even across index merges, correct?
Lucene definitely maintains index order for document additions -- but i
don't know if any similar claim has been made about merging whole indexes.
: this until I'm done
I don't know if there is anyway for a Custom Sort to access the lucene
score -- but another approach that works very well is to use the
FunctionQuery classes from Solr...
http://incubator.apache.org/solr/docs/api/org/apache/solr/search/function/package-summary.html
...you can make a
Daniel you are very clever! Your solution remind me this:
No temptation has overtaken you but such as is common to man; and God is
faithful, who will not allow you to be tempted beyond what you are able, but
with the temptation will provide the way of escape also, so that you will be
able to
On Mittwoch 05 April 2006 13:02, Max Pfingsthorn wrote:
The setMaxBufferedDocs and related parameters help a lot already to
fully exploit my RAM when indexing, but since I'm running a fairly small
index of around 4 docs and I'm optimizing it relatively often, I was
wondering if there is
Chris Hostetter wrote:
: exactly the same as how I insert them. Lucene is supposed to maintain
: document order, even across index merges, correct?
Lucene definitely maintains index order for document additions -- but i
don't know if any similar claim has been made about merging whole indexes.
On 4/5/06, Dan Armbrust [EMAIL PROTECTED] wrote:
I'll continue to try to generate a test case that gets the docs out of
order... but if someone in the know could answer authoritatively whether
I browsed the code for IndexWriter.addIndexes(Dir[]), and it looks
like it should preserve order.
The
On 4/5/06, Dan Armbrust [EMAIL PROTECTED] wrote:
I haven't been able to recreate
the out-of-order problem. However, with my real process, with a ton
more data, I can recreate it every single time I index (it even gets the
same documents out of order, consistently).
If you have enough file
Yonik Seeley wrote:
On 4/5/06, Dan Armbrust [EMAIL PROTECTED] wrote:
I'll continue to try to generate a test case that gets the docs out of
order... but if someone in the know could answer authoritatively whether
I browsed the code for IndexWriter.addIndexes(Dir[]), and it looks
like it
: Well, I set out to write JUnit test case to quickly show this... but
: I'm having a heck of a time doing it. With relatively small numbers of
: documents containing very few fields... I haven't been able to recreate
: the out-of-order problem. However, with my real process, with a ton
: more
Dan Armbrust wrote:
My indexing process works as follows (and some of this is hold-over from
the time before lucene had a compound file format - so bear with me)
I open up a File based index - using a merge factor of 90, and in my
current test, the compound index format. When I have added
On 4/5/06, Doug Cutting [EMAIL PROTECTED] wrote:
As others have noted, this should work correctly.
One slight oddity I noticed with addIndexes(Dir[]) is that merging
starts at one past the first new segment added (not the first new
segment). It doesn't seem like that should hurt much though.
Out of interest, does indexing time speed up much on 64-bit hardware?
I was able to speed up indexing on 64-bit platform by taking advantage of
the larger address space to parallelize the indexing process. One thread
creates index segments with a set of RAMDirectories and another thread
merges
Yonik Seeley wrote:
For your test case, try lowering numbers, such as maxBufferedDocs=2,
mergeFactor=2 or 3
to create more segments more quickly and cause more merges with fewer documents.
Good suggestion. A merge factor of 2 made it happen much more quickly.
Bug is filed:
Doug Cutting wrote:
I assume that your merge factor when calling addIndexes() is less than
90. If it's 90, then what you're doing is the same as Lucene would
automatically do. I think you could save yourself a lot of trouble if
you simply lowered your merge factor substantially and then
On 4/5/06, Dan Armbrust [EMAIL PROTECTED] wrote:
Yonik Seeley wrote:
For your test case, try lowering numbers, such as maxBufferedDocs=2,
mergeFactor=2 or 3
to create more segments more quickly and cause more merges with fewer
documents.
Good suggestion. A merge factor of 2 made it
Ah Ha! I found the problem.
SegmentInfos.read(Directory directory) reads the segment info in reverse order!
I gotta go home now... I'll look into the right fix later (it depends
on what else uses that method...)
FYI, I managed to reproduce it with only 3 documents in each index.
-Yonik
Spoke too soon... the loop counter goes down to zero, but it looks
like the segments are added in order.
for (int i = input.readInt(); i 0; i--) { // read segmentInfos
SegmentInfo si =
new SegmentInfo(input.readString(), input.readInt(), directory);
I realized what the real problem was during the drive home.
merged segments are added after all other segments, instead of the
spot the original segments resided.
I'll propose a patch soon...
-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server
OK, the following patch seems to work for me!
You might want to try it out on your larger test Dan.
The first part probably isn't necessary (the base=start instead of
start+1), but the second part is.
-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server
Index:
addIndexes(Dir[]) was the only user of mergeSegments() that passed an
endpoint that wasn't the end of the segment list, and hence the only
caller to mergeSegments() that will see a change of behavior.
Given that, I feel comfortable enough to commit this.
-Yonik
http://incubator.apache.org/solr
Thanks guys as always... lucene (and especially the people behind
it) are top notch.
Less than 6 hours from the time I figured out that the bug was in
Lucene (and not my code, which is usually the case) - and its already
fixed (I'm going to assume - I'll test it tomorrow when I get to work)
mark harwood wrote:
Isn't that what Query.extractTerms is for? Isn't it
implimented by all primitive Queries?..
As of last week, yes. I changed the SpanQueries to
implement this method and then refactored the
Highlighter package's QueryTermExtractor to make use
of this (it radically
40 matches
Mail list logo