Re: Efficient delete

2008-08-08 Thread Ian Lea
Michael Did you get anywhere with this? 3 secs for one delete is excessive. A job of mine ran earlier today and did 2000+ deletes by term on unique id in less than 9 seconds. The index is smaller, at around 5Gb, but I don't believe that would explain the difference. All the deletes were done

Re: Efficient delete

2008-08-08 Thread Michael McCandless
Also, can you do your deletes via IndexWriter (delete by Term) instead of opening IndexReader to do the deletes? Mike Ian Lea wrote: Michael Did you get anywhere with this? 3 secs for one delete is excessive. A job of mine ran earlier today and did 2000+ deletes by term on unique id in

SnowballAnalyzer question

2008-08-08 Thread Chris Bamford
Hi. I am using the SnowballAnalyzer because of it's multi-language stemming capabilities - and am very happy with that. There is one small glitch which I'm hoping to overcome - can I get it to split up internet domain names in the same way that StopAnalyzer does? i.e. for the sentence This is

RE: LineDocMaker usage

2008-08-08 Thread Brittany Jacobs
Thank you so much! Brittany Jacobs Java Developer JBManagement, Inc. 12 Christopher Way, Suite 103 Eatontown, NJ 07724 ph: 732-542-9200 ext. 229 fax: 732-380-0678 email: [EMAIL PROTECTED] -Original Message- From: Anshum [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 06, 2008 10:30 PM

RE: LineDocMaker usage

2008-08-08 Thread Brittany Jacobs
Why do you add to the doc twice, once with the file path and once with the string? -Brittany -Original Message- From: Anshum [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 06, 2008 10:30 PM To: java-user@lucene.apache.org Subject: Re: LineDocMaker usage Hi, How about just opening a

Unique list of keywords

2008-08-08 Thread Martin vWysiecki
Hello, i have very much data, about 20GB of text, and need a unique list of keywords based on my text in all docs from the whole index. Some ideas? THX Martin -- mit freundlichen Grüßen Martin von Wysiecki software development aspedia GmbH Roßlauer Weg 5 D-68309 Mannheim Telefon +49

Re: LineDocMaker usage

2008-08-08 Thread Ian Lea
In the example code 2 separate fields are being added to each document, the file name and the contents of one line. The fields can be queried or retrieved separately. There is a typo in the second Field line: should read line or whatever you wanted the field to be called. But as Anshum says,

RE: LineDocMaker usage

2008-08-08 Thread Brittany Jacobs
Oh! Thank you very much. Brittany Jacobs Java Developer JBManagement, Inc. 12 Christopher Way, Suite 103 Eatontown, NJ 07724 ph: 732-542-9200 ext. 229 fax: 732-380-0678 email: [EMAIL PROTECTED] -Original Message- From: Ian Lea [mailto:[EMAIL PROTECTED] Sent: Friday, August 08, 2008

Need help searching

2008-08-08 Thread Brittany Jacobs
I want to search all the documents for a string. So I have the following. But Hits isn't returning anything. What am I doing wrong? Thanks in advance. File f = new File(AddressData.txt); IndexWriter writer; try { writer = new IndexWriter(C:\\, new StandardAnalyzer(), true); FileInputStream

Re: Need help searching

2008-08-08 Thread Otis Gospodnetic
What Analyzer is your searcher using? C:\\ as the index location sounds super funky. Why not C:\\MyIndex , so your index files are not all mixed up with whatever lives in C:\\ Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Brittany

RE: Need help searching

2008-08-08 Thread Brittany Jacobs
I don't know what Analyzer. I'm new to all this. This is all I have so far. As far as the location of the index, this is just for test purposes. Brittany -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, August 08, 2008 1:13 PM To:

Re: Unique list of keywords

2008-08-08 Thread Daniel Naber
On Freitag, 8. August 2008, Martin vWysiecki wrote: i have very much data, about 20GB of text, and need a unique list of keywords based on my text in all docs from the whole index. Simply use IndexReader.terms() to iterate over all terms in the index. You can then use

Re: Term Based Meta Data

2008-08-08 Thread Martin Owens
Dear Lucene Users and Tricia Williams, The way we're operating our lucene index is one where we index all the terms but not store the text. From your SOLR-380 patch example Tricia I was able to get a very good idea of how to set things up. Historically I have used TermPositionsVector instead of

2.3.2 Indexing Performance

2008-08-08 Thread Gary Moore
Parsing and indexing 4.5 million MARC/XML bibliographic records was requiring ~14 hrs. using 2.2. The same job using 2.3 takes ~ 5 hrs. on the same platform -- a quad processor Sun V440 w/8GB memory. I'm using the PerFieldAnalyzerWrapper (StandardAnalyzer and SnowballAnalyzer). I'm

delete by doc id

2008-08-08 Thread Cam Bazz
hello, what would happen if I modified the class IndexWriter, and made the delete by id method public? I have two fields in my documents and I got to be able to delete by those two fields, (by query in other words) and I do not wish to go trunk version. I am getting quite desperate, and if not

Re: delete by doc id

2008-08-08 Thread Andy Triana
I rarely submit but I've been seeing this sort of thing more and more on this board. It seems that there is a need to treat Lucene as if it were a data storage or database like repository, where in fact it isn't. In our case, for large indexes we run either a parallel process to create the index

Re: delete by doc id

2008-08-08 Thread Michael McCandless
It's risky. How would you get the IDs to know which ones to delete? A separate reader running on the side? The problem is, as IndexWriter merges segments, the IDs shift. Any reader you have already open won't see this shift (until you reopen it), so you could end up deleting the wrong

Re: 2.3.2 Indexing Performance

2008-08-08 Thread Michael McCandless
Thanks for the data point! This is expected -- alot of work went into increasing IndexWriter's throughput in 2.3. Actually, I'd expect even more speedup, if indeed Lucene is the bottleneck in your app. You could test how much time just creating/ parsing tokenizing the docs (from

Deleting and adding docs

2008-08-08 Thread Andre Rubin
I'm new to Lucene, and I've been reading a lot of messages regarding deleting docs. But I think my problem is more basic. I can't delete docs from my index and (after the index is created the first time and the writer is closed) I can't add new documents to an existing index. Sorry for the

Re: Term Based Meta Data

2008-08-08 Thread Tricia Williams
Hi, Following the history of Payloads from its beginnings (https://issues.apache.org/jira/browse/LUCENE-755, https://issues.apache.org/jira/browse/LUCENE-761, https://issues.apache.org/jira/browse/LUCENE-834, http://wiki.apache.org/lucene-java/Payload_Planning) it looks like