Michael
Did you get anywhere with this? 3 secs for one delete is excessive. A
job of mine ran earlier today and did 2000+ deletes by term on unique
id in less than 9 seconds. The index is smaller, at around 5Gb, but I
don't believe that would explain the difference. All the deletes were
done
Also, can you do your deletes via IndexWriter (delete by Term) instead
of opening IndexReader to do the deletes?
Mike
Ian Lea wrote:
Michael
Did you get anywhere with this? 3 secs for one delete is excessive. A
job of mine ran earlier today and did 2000+ deletes by term on unique
id in
Hi.
I am using the SnowballAnalyzer because of it's multi-language stemming
capabilities - and am very happy with that.
There is one small glitch which I'm hoping to overcome - can I get it to
split up internet domain names in the same way that StopAnalyzer does?
i.e. for the sentence This is
Thank you so much!
Brittany Jacobs
Java Developer
JBManagement, Inc.
12 Christopher Way, Suite 103
Eatontown, NJ 07724
ph: 732-542-9200 ext. 229
fax: 732-380-0678
email: [EMAIL PROTECTED]
-Original Message-
From: Anshum [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 06, 2008 10:30 PM
Why do you add to the doc twice, once with the file path and once with the
string?
-Brittany
-Original Message-
From: Anshum [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 06, 2008 10:30 PM
To: java-user@lucene.apache.org
Subject: Re: LineDocMaker usage
Hi,
How about just opening a
Hello,
i have very much data, about 20GB of text, and need a unique list of
keywords based on my text in all docs from the whole index.
Some ideas?
THX
Martin
--
mit freundlichen Grüßen
Martin von Wysiecki
software development
aspedia GmbH
Roßlauer Weg 5
D-68309 Mannheim
Telefon +49
In the example code 2 separate fields are being added to each
document, the file name and the contents of one line. The fields can
be queried or retrieved separately. There is a typo in the second
Field line: should read line or whatever you wanted the field to be
called.
But as Anshum says,
Oh! Thank you very much.
Brittany Jacobs
Java Developer
JBManagement, Inc.
12 Christopher Way, Suite 103
Eatontown, NJ 07724
ph: 732-542-9200 ext. 229
fax: 732-380-0678
email: [EMAIL PROTECTED]
-Original Message-
From: Ian Lea [mailto:[EMAIL PROTECTED]
Sent: Friday, August 08, 2008
I want to search all the documents for a string.
So I have the following. But Hits isn't returning anything.
What am I doing wrong? Thanks in advance.
File f = new File(AddressData.txt);
IndexWriter writer;
try {
writer = new IndexWriter(C:\\, new StandardAnalyzer(), true);
FileInputStream
What Analyzer is your searcher using?
C:\\ as the index location sounds super funky.
Why not C:\\MyIndex , so your index files are not all mixed up with whatever
lives in C:\\
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Brittany
I don't know what Analyzer. I'm new to all this. This is all I have so
far. As far as the location of the index, this is just for test purposes.
Brittany
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Friday, August 08, 2008 1:13 PM
To:
On Freitag, 8. August 2008, Martin vWysiecki wrote:
i have very much data, about 20GB of text, and need a unique list of
keywords based on my text in all docs from the whole index.
Simply use IndexReader.terms() to iterate over all terms in the index. You
can then use
Dear Lucene Users and Tricia Williams,
The way we're operating our lucene index is one where we index all the
terms but not store the text. From your SOLR-380 patch example Tricia I
was able to get a very good idea of how to set things up. Historically I
have used TermPositionsVector instead of
Parsing and indexing 4.5 million MARC/XML bibliographic records was
requiring ~14 hrs. using 2.2. The same job using 2.3 takes ~ 5 hrs. on
the same platform -- a quad processor Sun V440 w/8GB memory. I'm
using the PerFieldAnalyzerWrapper (StandardAnalyzer and SnowballAnalyzer).
I'm
hello,
what would happen if I modified the class IndexWriter, and made the delete
by id method public?
I have two fields in my documents and I got to be able to delete by those
two fields, (by query in other words) and I do not wish to go trunk version.
I am getting quite desperate, and if not
I rarely submit but I've been seeing this sort of thing more and more
on this board.
It seems that there is a need to treat Lucene as if it were a data
storage or database like repository, where in
fact it isn't.
In our case, for large indexes we run either a parallel process to
create the index
It's risky.
How would you get the IDs to know which ones to delete? A separate
reader running on the side?
The problem is, as IndexWriter merges segments, the IDs shift. Any
reader you have already open won't see this shift (until you reopen
it), so you could end up deleting the wrong
Thanks for the data point!
This is expected -- alot of work went into increasing IndexWriter's
throughput in 2.3.
Actually, I'd expect even more speedup, if indeed Lucene is the
bottleneck in your app. You could test how much time just creating/
parsing tokenizing the docs (from
I'm new to Lucene, and I've been reading a lot of messages regarding
deleting docs. But I think my problem is more basic. I can't delete docs
from my index and (after the index is created the first time and the writer
is closed) I can't add new documents to an existing index.
Sorry for the
Hi,
Following the history of Payloads from its beginnings
(https://issues.apache.org/jira/browse/LUCENE-755,
https://issues.apache.org/jira/browse/LUCENE-761,
https://issues.apache.org/jira/browse/LUCENE-834,
http://wiki.apache.org/lucene-java/Payload_Planning) it looks like
20 matches
Mail list logo