On Wednesday 25 January 2006 20:51, Peter Keegan wrote:
The index is non-compound format and optimized. Yes, I did try
MMapDirectory, but the index is too big - 3.5 GB (1.3GB is term vectors)
Peter
You could also give this a try:
http://issues.apache.org/jira/browse/LUCENE-283
Regards,
I am attempting to prune an index by getting each document in turn and
then checking/deleting it:
IndexReader ir = IndexReader.open(path);
for(int i=0;iir.numDocs();i++) {
Document doc = ir.document(i);
if(thisDocShouldBeDeleted(doc)) {
ir.delete(docNum); // - I
On Thursday 26 January 2006 09:15, Chun Wei Ho wrote:
I am attempting to prune an index by getting each document in turn and
then checking/deleting it:
IndexReader ir = IndexReader.open(path);
for(int i=0;iir.numDocs();i++) {
Document doc = ir.document(i);
Speaking of NioFSDirectory, I thought there was one posted a while
ago, is this something that can be used?
http://issues.apache.org/jira/browse/LUCENE-414
ray,
On 11/22/05, Doug Cutting [EMAIL PROTECTED] wrote:
Jay Booth wrote:
I had a similar problem with threading, the problem turned out
Hi,
Thanks for the help, just a few more questions:
On 1/26/06, Paul Elschot [EMAIL PROTECTED] wrote:
On Thursday 26 January 2006 09:15, Chun Wei Ho wrote:
I am attempting to prune an index by getting each document in turn and
then checking/deleting it:
IndexReader ir =
Hello,
I 've a problem with data i try to index with lucene. I browse a
directory and index text from different types of files throw parsers.
For text files, data could be in different languages so different
encoding. If data are in Turkish for exemple, all special characters and
accents are
For the recent questions about this here are a couple of methods for
encoding/decoding long values that will be sorted into order by a range
query
public static String encodeLong(long num) {
String hex = Long.toHexString(num 0 ? Long.MAX_VALUE -
(0xL ^ num) : num);
Yes, that is correct...you need to rewrite the query. I was actually the main
developer for the 1.5 .NET port, so if you come across any issues, please email
me at my hotmail address which I check more often than this one...
-Joe Langley
-Original Message-
From: Gwyn Carwardine
Hello and thanks for your answer.
I do not find the ISOLatin1AccentFilter class in my lucene jar, but I find one
on google attach to this mail, could you tell me if it is the good one?
I do not see anything in this class which can help me. This program will
replace some accent characters but
Paul,
I tried this but it ran out of memory trying to read the 500Mb .fdt file. I
tried various values for MAX_BBUF, but it still ran out of memory (I'm using
-Xmx1600M, which is the jvm's maximum value (v1.5)) I'll give
NioFSDirectory a try.
Thanks,
Peter
On 1/26/06, Paul Elschot [EMAIL
Ray,
The throughput is worse with NioFSDIrectory than with the FSDIrectory
(patched and unpatched). The bottleneck still seems to be synchronization,
this time in NioFile.getChannel (7 of the 8 threads were blocked there
during one snapshot). I tried this with 4 and 8 channels.
The throughput
Hmmm, can you run the 64 bit version of Windows (and hence a 64 bit JVM?)
We're running with heap sizes up to 8GB (RH Linux 64 bit, Opterons,
Sun Java 1.5)
-Yonik
On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote:
Paul,
I tried this but it ran out of memory trying to read the 500Mb .fdt file.
On Jan 26, 2006, at 7:26 PM, arnaudbuffet wrote:
I do not find the ISOLatin1AccentFilter class in my lucene jar, but
I find one on google attach to this mail, could you tell me if it
is the good one?
This used to be in contrib/analyzers but has been moved into the core
(Subversion only
arnaudbuffet wrote:
if I try to index a text file encoded in Western 1252 for exemple with the Turkish text
düzenlediğimiz kampanyamıza the lucene index will contain re encoded data with
#0;#17;k#0;#0;
ISOLatin1AccentFilter.removeAccents() converts that string to
duzenlediğimiz
I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on
Intel. If you know of any, please let me know. Linux may be an option, too.
btw, I'm getting a sustained rate of 135 queries/sec with 4 threads, which
is pretty impressive. Another way around the concurrency limit is to run
BEA Jrockit supports both AMD64 and Intel's EM64T (basically renamed AMD64)
http://www.bea.com/framework.jsp?CNT=index.htmFP=/content/products/jrockit/
and Sun's Java 1.5 for Windows AMD64 Platform
They advertize AMD64, presumably because that's what there servers
use, but it should work on
: The document number is the variable i in this case.
: If the document number is the variable i (enumerated from numDocs()),
: what's the difference between numDocs() and maxDoc() in this case? I
: was previously under the impression that the internal docNum might be
: different to the counter.
On Thursday 26 January 2006 09:47, Chun Wei Ho wrote:
Hi,
Thanks for the help, just a few more questions:
On 1/26/06, Paul Elschot [EMAIL PROTECTED] wrote:
On Thursday 26 January 2006 09:15, Chun Wei Ho wrote:
I am attempting to prune an index by getting each document in turn and
On Thursday 26 January 2006 19:44, Chris Hostetter wrote:
: The document number is the variable i in this case.
: If the document number is the variable i (enumerated from numDocs()),
: what's the difference between numDocs() and maxDoc() in this case? I
: was previously under the
Hello,
On Jan 26, 2006, at 12:01, John Haxby wrote:
I have a perl script here that I used to generate downgrading table
for a C program. I can let you have the perl script as is, but if
there's enough interest(*) I'll use it to generate, say,
CompoundAsciiFilter since it converts compound
Doug Cutting wrote:
A 64-bit JVM with NioDirectory would really be optimal for this.
Oops. I meant MMapDirectory, not NioDirectory.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL
Dumb question: does the 64-bit compiler (javac) generate different code than
the 32-bit version, or is it just the jvm that matters? My reported speedups
were soley from using the 64-bit jvm with jar files from the 32-bit
compiler.
Peter
On 1/26/06, Yonik Seeley [EMAIL PROTECTED] wrote:
Nice
There is no difference in bytecode... the whole difference is just in
the underlying JVM.
-Yonik
On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote:
Dumb question: does the 64-bit compiler (javac) generate different code than
the 32-bit version, or is it just the jvm that matters? My reported
Peter,
Wow, the speed up in impressive! But may I ask what did you do to
achieve 135 queries/sec prior to the JVM swich?
ray,
On 1/27/06, Peter Keegan [EMAIL PROTECTED] wrote:
Correction: make that 285 qps :)
On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote:
I tried the AMD64-bit JVM
Ray,
The short answer is that you can make Lucene blazingly fast by using advice
and design principles mentioned in this forum and of course reading 'Lucene
in Action'. For example, use a 'content' field for searching all fields (vs
mutli-field search), put all your stored data in one field,
Hello,
I have a couple instances of lucene. I just altered on implementation and now
its not keeping a segments file. while indexing occurs, there is a segment
file.but once its done, there isn't.all the other indexes have one.
the problem comes when i try to update a document,
Paul,
Thanks for the advice! But for the 100+queries/sec on a 32-bit
platfrom, did you end up applying other patches? or use different
FSDirectory implementations?
Thanks!
ray,
On 1/27/06, Peter Keegan [EMAIL PROTECTED] wrote:
Ray,
The short answer is that you can make Lucene blazingly fast
Ray,
The 135 qps rate was using the standard FSDirectory in 1.9.
Peter
On 1/26/06, Ray Tsang [EMAIL PROTECTED] wrote:
Paul,
Thanks for the advice! But for the 100+queries/sec on a 32-bit
platfrom, did you end up applying other patches? or use different
FSDirectory implementations?
Since I didn't find anything in the log from log4j I did a kill
-3 on
the process and found two very interesting things:
Almost all multisearcher threads were in this state:
MultiSearcher thread #1 daemon prio=10 tid=0x01900960
nid=0x81442c waiting for monitor entry
Hi,
I want to know how the lucene normalizes the score. I see hits class has
this function to get each document's score. But i dont know how lucene
calculates the normalized score and in the Lucene in action, it only said
normalized score of the nth top scoring docuemnts.
--
Regards
Jiang Xing
I seem to say this a lot :), but, assuming your OS has a decent
filesystem cache, try reducing your JVM heapsize, using an FSDirectory
instead of RAMDirectory, and see if your filesystem cache does ok. If
you have 12GB, then you should have enough RAM to hold both the old
and new indexes during
31 matches
Mail list logo