Tuning Indexing performance question ..

2006-04-10 Thread Mufaddal Khumri

Hi,

I am using a multi threaded app to index a bunch of Data. The app spawns 
X number of threads. Each thread writes to a RAMDirectory. When thread 
finishes it work, the contents from the RAMDirectory are written into 
the FSDirectory. All threads are passed an instance of the FSWriter when 
they are created.


Now, reading the Lucened docs, I understand the indexing performance can 
be further tweaked by playing with mergeFactor, maxMergeDocs and 
minMergeDocs. Am I understanding this right that these three parameters 
effect the writing of the index to the FSDirectory and not to the 
RAMDirectory (Since a RAMDirectory exists entirely in memory)? In other 
words, does tweaking the three parameters - mergeFactor, maxMergeDocs 
and minMergeDocs effect the performance of writing to the RAMDirectory?


-Thanks


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Regarding Indexes

2006-03-31 Thread Mufaddal Khumri
The solution to your problem lies in answers to many business domain
specific questions like:

1. Will each company only want to carry out searches on their data or on
ALL the data? 
2. If you do not know the answer to that, is there a chance that the
some companies would want to search only their data and some others
would want to search data from company a and company b? and yet another
company would want to search all the data?
3. How does having just one index opposed to individual indexes affect
indexing given the load that it will have to handle from one or more
companies? [Note: you could also index data from a group of companies
according to howmuch data on average they might have]
4. It might turn up that at this point you nor the companies might have
no way of knowing howmuch data they will have, in that case you will
have to use your best judgement in what path to take and build your app
in an a way such that it can be abstracted from whether the index is
being indexed in one index or in multiple indexes. Later you can toy
around with different setups as you get more understanding on the usage
of the application.

One way is to index all the data from a particular company with one of
the terms being companyIdentifier ... This way you will have the
ability to search within a company d's data or within a few different
company's data or the entire search index.


-Mufaddal.

-Original Message-
From: Ravi [mailto:[EMAIL PROTECTED] 
Sent: Friday, March 31, 2006 9:22 AM
To: java-user@lucene.apache.org
Subject: Regarding Indexes

Hi Luceners,

  

  This is the my problem . Can any body give the solution for this one..

 

 

I am going to implement for the company which is going to Support ASP
(Application Service Provider )  model. 

 

In this model , around 200 companies are going to register with us and
add
there documents and searches them . 

Now the problem is shall I maintain individual index files for each
company
or maintain single index file for all the companies.

 

 

If I maintain  individual index files then I need to create 200 searcher
objects for them because. each index should be searched..

 

But if I maintain single index file , I can have one single index
searcher
but I need to add the condition for each document. And more over in
feature
if any body needed there own data we can not provide them .. so please
tell
me which model can help us to solve this problem.. the key point in this
application is add/modify/delete will occur very frequently . Please
help me
 I am waiting for your feed back

 

 

Thanks

 

Ravi Kumar Jaladanki

  

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Update or Delete Document for Lucene 1.4.x

2006-03-31 Thread Mufaddal Khumri
The way you update a document in lucene is by deleting the current one
and adding a new one. 

-Mufaddal.

-Original Message-
From: Don Vaillancourt [mailto:[EMAIL PROTECTED] 
Sent: Friday, March 31, 2006 1:37 PM
To: java-user@lucene.apache.org
Subject: Update or Delete Document for Lucene 1.4.x

Hi All,

I need to implement the ability to update one document within a Lucene 
collection.

I haven't been able to find anything in the API.  Is there a way to 
update one document or delete a document so that I can add an update?

Thank You

-- 
Don Vaillancourt
Director of Software Development
WEB IMPACT INC.
phone:   416-815-2000 ext. 245
fax: 416-815-2001
toll free:   866-319-1573 ext. 245
email:   [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
blackberry:  [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED]
web: http://www.web-impact.com
address: http://www.mapquest.ca 
http://www.mapquest.com/maps/map.adp?country=CAaddtohistory=formtype=
addresssearchtype=addresscat=address=99%20Atlantic%20Avecity=Toronto
state=ONzipcode=M6K%203J8 


This email message is intended only for the addressee(s) and contains 
information that may be confidential and/or copyright.

If you are not the intended recipient please notify the sender by reply 
email and immediately delete this email.

Use, disclosure or reproduction of this email by anyone other than the 
intended recipient(s) is strictly prohibited. No representation is made 
that this email or any attachments are free of viruses. Virus scanning 
is recommended and is the responsibility of the recipient.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Getting no hits ...

2006-02-23 Thread Mufaddal Khumri
I have been trying to figure out why my query below would not return any 
hits.


I use two custom analyzers for indexing and searching. The one I use for 
indexing uses this:


   public TokenStream tokenStream(String fieldName, Reader reader)
   {
   TokenStream result = new StandardTokenizer(reader);
   result = new StandardFilter(result);
   result = new LowerCaseFilter(result);
   result = new StopFilter(result, stopSet);
   result = new SynonymFilter(result, new MySynonymEngine());
   result = new PorterStemFilter(result);
   return result;
   }

The one I use for searching uses this:

   public TokenStream tokenStream(String fieldName, Reader reader)
   {
   TokenStream result = new StandardTokenizer(reader);
   result = new StandardFilter(result);
   result = new LowerCaseFilter(result);
   result = new StopFilter(result, stopSet);
   result = new PorterStemFilter(result);
   return result;
   }

(Basically while searching I do not use the SynonymFilter.)

I have quite a few products that I index that have the text on which I 
am querying on.


I do a search for this: ES-20D

This is the final query that I run:
+(+content:es\-20d) +entity:product +(title:es\-20d~2^40.0 
((title:es\-20d)^10.0) content:es\-20d~2^20.0 (content:es\-20d) 
categoryName:es\-20d^80.0)


(The content and title fields are Indexed, Tokenized and Stored. The 
categoryName field is Indexed and Stored.)


I get no hits?

Where am i going wrong with this? Any pointers?

-Thanks.





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Getting no hits ...

2006-02-23 Thread Mufaddal Khumri
In my earlier email i put in the wrong query that I am searching on. The 
correct query is: EOS-20D


And this is the query under question that is producing no hits still:

+(+content:eos\-20d) +entity:product +(title:eos\-20d~2^40.0 
((title:eos\-20d)^10.0) content:eos\-20d~2^20.0 (content:eos\-20d) 
categoryName:eos\-20d^80.0)


I have used the AnalyzerUtils.displayTokensWithFullDetails(analyzer, 
string); (AnalyzerUtils from the LIA book).


This is part of the log output from using the 
AnalyzerUtils.displayTokensWithFullDetails(analyzer, string) when this 
product gets indexed:



119: [013803044430:857-869:ALPHANUM]
120: [eos-20d:870-877:NUM]
121: [011-eos-20d:878-889:NUM]

This is part of the log output from using the 
AnalyzerUtils.displayTokensWithFullDetails(analyzer, string) when I do 
the search:

1: [eos-20d:0-6:NUM]

From what I understand I see that the analyzer is producing the same 
tokens while indexing and during searching.


Chris Hostetter wrote:


1) Have you looked at what tokens your indexing analyzer produces when you
  tokenize ES-20D ?
2) Have you looked at what tokens your query analyser products when you
  tokenize ES-20D ?
3) Have you tried a simpler query (ie: just content:es\-20d ) ?
4) When giving QueryParser a (quoted) phrase search, i don't think you
  really want to escape that - character.



: Date: Thu, 23 Feb 2006 14:16:42 -0700
: From: Mufaddal Khumri [EMAIL PROTECTED]
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Getting no hits ...
:
: I have been trying to figure out why my query below would not return any
: hits.
:
: I use two custom analyzers for indexing and searching. The one I use for
: indexing uses this:
:
: public TokenStream tokenStream(String fieldName, Reader reader)
: {
: TokenStream result = new StandardTokenizer(reader);
: result = new StandardFilter(result);
: result = new LowerCaseFilter(result);
: result = new StopFilter(result, stopSet);
: result = new SynonymFilter(result, new MySynonymEngine());
: result = new PorterStemFilter(result);
: return result;
: }
:
: The one I use for searching uses this:
:
: public TokenStream tokenStream(String fieldName, Reader reader)
: {
: TokenStream result = new StandardTokenizer(reader);
: result = new StandardFilter(result);
: result = new LowerCaseFilter(result);
: result = new StopFilter(result, stopSet);
: result = new PorterStemFilter(result);
: return result;
: }
:
: (Basically while searching I do not use the SynonymFilter.)
:
: I have quite a few products that I index that have the text on which I
: am querying on.
:
: I do a search for this: ES-20D
:
: This is the final query that I run:
: +(+content:es\-20d) +entity:product +(title:es\-20d~2^40.0
: ((title:es\-20d)^10.0) content:es\-20d~2^20.0 (content:es\-20d)
: categoryName:es\-20d^80.0)
:
: (The content and title fields are Indexed, Tokenized and Stored. The
: categoryName field is Indexed and Stored.)
:
: I get no hits?
:
: Where am i going wrong with this? Any pointers?
:
: -Thanks.
:
:
:
:
:
: -
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Getting no hits ...

2006-02-23 Thread Mufaddal Khumri

Follow up on my previous email ...

When I execute this query using luke using the standard analyzer on the 
same index, i get 8 hits.
+(+content:eos\-20d) +entity:product +(title:eos\-20d~2^40.0 
((title:eos\-20d)^10.0) content:eos\-20d~2^20.0 (content:eos\-20d) 
categoryName:eos\-20d^80.0)


I modified my searching code to use the standard analyzer, but i did not 
get any hits back. I am still trying to figure out the problem out. Any 
ideas?


Mufaddal Khumri wrote:

In my earlier email i put in the wrong query that I am searching on. 
The correct query is: EOS-20D


And this is the query under question that is producing no hits still:

+(+content:eos\-20d) +entity:product +(title:eos\-20d~2^40.0 
((title:eos\-20d)^10.0) content:eos\-20d~2^20.0 (content:eos\-20d) 
categoryName:eos\-20d^80.0)


I have used the AnalyzerUtils.displayTokensWithFullDetails(analyzer, 
string); (AnalyzerUtils from the LIA book).


This is part of the log output from using the 
AnalyzerUtils.displayTokensWithFullDetails(analyzer, string) when this 
product gets indexed:



119: [013803044430:857-869:ALPHANUM]
120: [eos-20d:870-877:NUM]
121: [011-eos-20d:878-889:NUM]

This is part of the log output from using the 
AnalyzerUtils.displayTokensWithFullDetails(analyzer, string) when I do 
the search:

1: [eos-20d:0-6:NUM]

From what I understand I see that the analyzer is producing the same 
tokens while indexing and during searching.


Chris Hostetter wrote:

1) Have you looked at what tokens your indexing analyzer produces 
when you

  tokenize ES-20D ?
2) Have you looked at what tokens your query analyser products when you
  tokenize ES-20D ?
3) Have you tried a simpler query (ie: just content:es\-20d ) ?
4) When giving QueryParser a (quoted) phrase search, i don't think you
  really want to escape that - character.



: Date: Thu, 23 Feb 2006 14:16:42 -0700
: From: Mufaddal Khumri [EMAIL PROTECTED]
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Getting no hits ...
:
: I have been trying to figure out why my query below would not 
return any

: hits.
:
: I use two custom analyzers for indexing and searching. The one I 
use for

: indexing uses this:
:
: public TokenStream tokenStream(String fieldName, Reader reader)
: {
: TokenStream result = new StandardTokenizer(reader);
: result = new StandardFilter(result);
: result = new LowerCaseFilter(result);
: result = new StopFilter(result, stopSet);
: result = new SynonymFilter(result, new MySynonymEngine());
: result = new PorterStemFilter(result);
: return result;
: }
:
: The one I use for searching uses this:
:
: public TokenStream tokenStream(String fieldName, Reader reader)
: {
: TokenStream result = new StandardTokenizer(reader);
: result = new StandardFilter(result);
: result = new LowerCaseFilter(result);
: result = new StopFilter(result, stopSet);
: result = new PorterStemFilter(result);
: return result;
: }
:
: (Basically while searching I do not use the SynonymFilter.)
:
: I have quite a few products that I index that have the text on which I
: am querying on.
:
: I do a search for this: ES-20D
:
: This is the final query that I run:
: +(+content:es\-20d) +entity:product +(title:es\-20d~2^40.0
: ((title:es\-20d)^10.0) content:es\-20d~2^20.0 (content:es\-20d)
: categoryName:es\-20d^80.0)
:
: (The content and title fields are Indexed, Tokenized and Stored. The
: categoryName field is Indexed and Stored.)
:
: I get no hits?
:
: Where am i going wrong with this? Any pointers?
:
: -Thanks.
:
:
:
:
:
: -
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



ArrayIndexOutOfBoundsException being thrown ...

2006-02-22 Thread Mufaddal Khumri
Getting an ArrayIndexOutOfBoundsException ...

Line 31 in IndexSearcherManager.java:
...

public static IndexSearcher getIndexSearcher(String indexPath) 
{
logger.debug(indexPath =  + indexPath);

 
searcher = new IndexSearcher(indexPath);  LINE 
31

return searcher;
}
...
...

I get the following exception:

28628 DEBUG com.allegrocentral.tandoori.managers.search.IndexSearcherManager 
[21] - indexPath = /opt/tomcat/webapps/ROOT/WEB-INF/search-index
28666 WARN  org.apache.struts.action.RequestProcessor [516] - Unhandled 
Exception thrown: class java.lang.ArrayIndexOutOfBoundsException
28669 ERROR 
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/].[action] 
[704] - Servlet.service() for servlet action threw exception
java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.get(ArrayList.java:323)
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155)
at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:151)
at 
org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java:149)
at 
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:115)
at 
org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java:86)
at 
org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:45)
at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:112)
at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:89)
at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
at org.apache.lucene.store.Lock$With.run(Lock.java:109)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
at org.apache.lucene.search.IndexSearcher.init(IndexSearcher.java:38)
at 
com.allegrocentral.tandoori.managers.search.IndexSearcherManager.getIndexSearcher(IndexSearcherManager.java:31)

Any ideas as to why this might be happening? (Am using lucene-core-1.9-rc1.jar)

-Thanks.


hyphen not being removed by standard filter

2006-02-22 Thread Mufaddal Khumri
Hi,

I might be missing something. I have a custom analyzer the gist of which is:

public TokenStream tokenStream(String fieldName, Reader reader)
{
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopSet);
result = new PorterStemFilter(result);
return result;
}

I test my above analyzer with the following query string:
the is EOS-20D canon amazing

In my test code I do this  to see what my analyzed query string looks like:

PerFieldAnalyzerWrapper analyzer = new 
PerFieldAnalyzerWrapper(new StandardStemmingAnalyzer());
analyzer.addAnalyzer(categoryNames, new KeywordAnalyzer());

TokenStream stream = analyzer.tokenStream(null, new 
StringReader(queryString));
String analyzedQueryString = ;

while(true)
{
Token token = stream.next();
if(token == null)
{
break;
}

analyzedQueryString = analyzedQueryString + 
token.termText() +  ;
}

analyzedQueryString = analyzedQueryString.trim();

log.debug(analyzedQueryString =  + analyzedQueryString);

The output of the log statement above is:

analyzedQueryString = eos-20d canon amaz

I see that the common stop words have been removed, everything has been lower 
cased and even the query has also been stemmed, why was the hyphen not removed 
by the standard filter??? Or does the standard analyzer remove hyphens only 
from phrases like eos - 20d and not from eos-20d ?

Thanks.


get results by relevance, limiting results and then sort the results by some criterion

2006-02-21 Thread Mufaddal Khumri
When I do a search for example on batteries i get 1200+ results. I 
would like to show the user lets say 300. I can do that by only 
extracting the first 300 hits (sorted by decreasing relevance by 
default) and displaying those to the user.


Now on the search results page, I have a drop down box that lets the 
user sort the results by price. When the user selects the Sort by price 
low to high, i would like to be able to sort the same 300 hits I got 
above (sorted by decreasing relevance by default) by price.


Essentially I want to be able to sort the first 300 relevant search 
results by price. (in other words I would like to be able to get search 
results by relevance, limit the results and sort the results by some 
criterion).


What would be a good way to do this in lucene?

-Thanks.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: get results by relevance, limiting results and then sort the results by some criterion

2006-02-21 Thread Mufaddal Khumri
So yes, if the xth + 1 item happens to be a camera and if its price 
happens to be lower than the previous x cameras it wont be included in 
this view and that is exactly what we want.


Mufaddal Khumri wrote:


In my case when we search for lets say cameras , my top x results are
all sorts of cameras and then i get documents that match camera 
casings etc.


As a company we want to show as many cameras as possible and not other 
camera
related products for this one web view on a specific page we have. On 
this same page
we also want to provide a way that the user can select price high to 
low or price low to high and sort these top x results. Essentially 
the hard part is to come up with the X so that you ideally dont prune 
any cameras.
As a business we want to strive to get as many cameras in the search 
results, but at the same time we dont mind if a few cameras do not 
appear in those results if we can really fine tune our search results 
to only show cameras and not camera casings and camera batteries etc.


I have been looking at QueryFilter and the Sort API, but havent yet 
figured out a way to do what I am trying to do .. any pointers are 
greatly appreciated.


-Thanks,

John Powers wrote:


I'm sure you've taken care of this, but I am curious myself:

If the 301 document only has a single term batteries (and thus is so
far low on the Hits), but has a price of seven cents, then the sort of
all the documents with batteries would put this near the top, but by
eliminating all documents above 300, this one doesn't appear in the
solution you are working for, correct?Why is that a good thing?
It seems you would want to sort on the full document list, and then
return on the 300 top that you want the user to see. 
I think I'm just curious why getting rid of some that could (in a new

sort) be of higher relevance is a good thing.

-Original Message-
From: Mufaddal Khumri [mailto:[EMAIL PROTECTED] Sent: 
Tuesday, February 21, 2006 10:33 AM

To: java-user@lucene.apache.org
Subject: get results by relevance, limiting results and then sort the
results by some criterion

When I do a search for example on batteries i get 1200+ results. I 
would like to show the user lets say 300. I can do that by only 
extracting the first 300 hits (sorted by decreasing relevance by 
default) and displaying those to the user.


Now on the search results page, I have a drop down box that lets the 
user sort the results by price. When the user selects the Sort by price


low to high, i would like to be able to sort the same 300 hits I got 
above (sorted by decreasing relevance by default) by price.


Essentially I want to be able to sort the first 300 relevant search 
results by price. (in other words I would like to be able to get 
search results by relevance, limit the results and sort the results 
by some criterion).


What would be a good way to do this in lucene?

-Thanks.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: get results by relevance, limiting results and then sort the results by some criterion

2006-02-21 Thread Mufaddal Khumri
Currently I am doing exactly that. I am boosting relevant docs and I am 
sorting in java to get the desired effect. I just was trying to see if I 
can do something using QueryFilter or Sorts and do what I am doing.


-Thanks.

John Powers wrote:


Also, if you don't like the tag solution, you could borrow something
right from LIA...   boost the documents that are significant products
with 1.5 (or whatever higher then 1), and the support/ancillary products
boot with .1

If there is nothing relavent in the significant products, at least
you'll get some of these.   After all they may search for bolt 
maybe they want an ancillary product.


-Original Message-
From: Mufaddal Khumri [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 21, 2006 12:06 PM

To: java-user@lucene.apache.org
Subject: Re: get results by relevance, limiting results and then sort
the results by some criterion

So yes, if the xth + 1 item happens to be a camera and if its price 
happens to be lower than the previous x cameras it wont be included in 
this view and that is exactly what we want.


Mufaddal Khumri wrote:

 


In my case when we search for lets say cameras , my top x results are
all sorts of cameras and then i get documents that match camera 
casings etc.


As a company we want to show as many cameras as possible and not other
   



 


camera
related products for this one web view on a specific page we have. On 
this same page
we also want to provide a way that the user can select price high to 
low or price low to high and sort these top x results. Essentially 
the hard part is to come up with the X so that you ideally dont prune 
any cameras.
As a business we want to strive to get as many cameras in the search 
results, but at the same time we dont mind if a few cameras do not 
appear in those results if we can really fine tune our search results 
to only show cameras and not camera casings and camera batteries etc.


I have been looking at QueryFilter and the Sort API, but havent yet 
figured out a way to do what I am trying to do .. any pointers are 
greatly appreciated.


-Thanks,

John Powers wrote:

   


I'm sure you've taken care of this, but I am curious myself:

If the 301 document only has a single term batteries (and thus is
 


so
 


far low on the Hits), but has a price of seven cents, then the sort
 


of
 


all the documents with batteries would put this near the top, but
 


by
 


eliminating all documents above 300, this one doesn't appear in the
solution you are working for, correct?Why is that a good thing?
It seems you would want to sort on the full document list, and then
return on the 300 top that you want the user to see. 
I think I'm just curious why getting rid of some that could (in a new

sort) be of higher relevance is a good thing.

-Original Message-
From: Mufaddal Khumri [mailto:[EMAIL PROTECTED] Sent: 
Tuesday, February 21, 2006 10:33 AM

To: java-user@lucene.apache.org
Subject: get results by relevance, limiting results and then sort the
results by some criterion

When I do a search for example on batteries i get 1200+ results. I 
would like to show the user lets say 300. I can do that by only 
extracting the first 300 hits (sorted by decreasing relevance by 
default) and displaying those to the user.


Now on the search results page, I have a drop down box that lets the 
user sort the results by price. When the user selects the Sort by
 


price
 


low to high, i would like to be able to sort the same 300 hits I got
 



 


above (sorted by decreasing relevance by default) by price.

Essentially I want to be able to sort the first 300 relevant search 
results by price. (in other words I would like to be able to get 
search results by relevance, limit the results and sort the results 
by some criterion).


What would be a good way to do this in lucene?

-Thanks.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

   




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: get results by relevance, limiting results and then sort the results by some criterion

2006-02-21 Thread Mufaddal Khumri

Hi,

Thats exactly what I am doing currently. Was just wondering if there is 
a lucene way to do what I am doing using QueryFilter etc.


-Thanks.

Dan Armbrust wrote:


Mufaddal Khumri wrote:

When I do a search for example on batteries i get 1200+ results. I 
would like to show the user lets say 300. I can do that by only 
extracting the first 300 hits (sorted by decreasing relevance by 
default) and displaying those to the user.





If you are only talking about ordering the number of items that you 
are going to show to the user, that seems to imply that the number 
will be small.  Why don't you just re-sort the items that you are 
going to display to the user somewhere in your code after you get the 
documents back from lucene?  It may not be quite as clean, but I doubt 
that there will be any performance impact.


Dan




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



exact match ..

2006-02-20 Thread Mufaddal Khumri

lets say i do this while indexing:

doc.add(Field.Text(categoryNames, categoryNames));

Now while searching categoryNames, I do a search for digital cameras. 
I only want to match the exact phrase digital cameras with documents who 
have exactly the phrase digital cameras in the categoryNames field. I 
do not want results that have digital camera batteries part of the 
result.


Whats the best way to accomplish this?

thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



span first query and boosting ..

2006-02-20 Thread Mufaddal Khumri

Hi,

I do this:

SpanFirstQuery fullPhraseInCategoryNamesQuery = new SpanFirstQuery(new 
SpanTermQuery(new Term(categoryNames, digital cameras)), 2);

fullPhraseInCategoryNamesQuery.setBoost(8);

In my log output i get this:

spanFirst(categoryNames:digit camera, 2))

Why cant I boost a span query? What am i doing wrong?

-Thanks

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: exact match ..

2006-02-20 Thread Mufaddal Khumri

Hi Steve,

If I understand you right, I could use something like the Keyword 
analyzer to tokenize the entire stream as a single token and store that 
in the index. I could definitely the keyword analyzer while indexing 
this particular field categoryNames.


Now my questions is on how to search and boost this since this is part 
of a bigger boolean query in my case.


My typical query actually looks like:

+(+content:digit +content:camera) +entity:product +(title:digit 
camera~2^40.0 ((title:digit title:camera)^10.0) content:digit 
camera~2^20.0 (content:digit content:camera) categoryNames:digit 
camera^80.0)


As you can see i was trying to do a phrase query on the categoryNames 
field and boosting it by 80.0.
Also I am using the potter stemming filter to stem while searching. (I 
do this while indexing as well). If I go with the KeywordAnalyzer 
approach I can index the categoryNames field using this analyzer .


Would I be using the QueryParser to create my query and specify the 
keyword analyzer to it while searching on categoryNames ? (and then make 
that query part of my global boolean query?)


-Thanks.





Steven Rowe wrote:


Mufaddal Khumri wrote:


lets say i do this while indexing:

doc.add(Field.Text(categoryNames, categoryNames));

Now while searching categoryNames, I do a search for digital 
cameras. I only want to match the exact phrase digital cameras with 
documents who have exactly the phrase digital cameras in the 
categoryNames field. I do not want results that have digital camera 
batteries part of the result.


Whats the best way to accomplish this?



Hi Mufaddal,

One way to do this is to use the KeywordAnalyzer (in the Lucene 
Subversion trunk, but not in v1.4.3; will be in forthcoming v1.9) for 
the categoryNames field.  This analyzer does not tokenize field 
contents, so digital cameras would be a single token, and the only 
thing that would match it would be the exact same single token.  Be 
careful when you search to construct the search tokens similarly.


If you have other fields you want to search, and you want to tokenize 
their contents when you index them, you could use the 
PerFieldAnalyzerWrapper, so that the KeywordAnalyzer is only used for 
the categoryNames field.


Steve

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



StandardAnalyzer .. stemming

2006-02-17 Thread Mufaddal Khumri
The SnowBallAnalyzer seems to offer stemming. The StandardAnalyzer on 
the other hand has a bunch of other niceness. What is the best practice 
of leveraging both these analyzers while indexing and searching? Do I 
chain these up somehow and if so what apis do i look at for doing so? Do 
i implement my own analyzer and use both these two process the tokens?


Thanks,


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: StandardAnalyzer .. stemming

2006-02-17 Thread Mufaddal Khumri

Thank you. I think in my case i can just do the last approach you suggested.

One more question, what jar is SnowballFilter part of?

Chris Hostetter wrote:


: The SnowBallAnalyzer seems to offer stemming. The StandardAnalyzer on
: the other hand has a bunch of other niceness. What is the best practice
: of leveraging both these analyzers while indexing and searching? Do I
: chain these up somehow and if so what apis do i look at for doing so? Do
: i implement my own analyzer and use both these two process the tokens?

the Analyzer class is already designed to making chaining very easy -- but
not Analyzer chaining, TokenFilter chaining.

if you take a look at the source for StandardAnalyzer and SnowBallAnalyzer
it should (hopefully) be very obvious how to write your own (10 line or
less) Analyzer that gives youall the goodness you want from both...

http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/lucene/analysis/standard/StandardAnalyzer.java?rev=219090view=markup
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/snowball/src/java/org/apache/lucene/analysis/snowball/SnowballAnalyzer.java?rev=151459view=markup

...if you literaly just want to add snowball stemming to
the end of StandardAnalyzer, then i *think* something like this would
work...

  Analyzer a = new StandardAnalyzer(stoplist) {
public TokenStream tokenStream(String fieldName, Reader reader) {
  return new SnowballFilter(super.tokenStream(fieldName,reader),
yourChoiceOfStemmerName);
}
  }


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene Query ... understanding

2006-02-16 Thread Mufaddal Khumri

Hi,

Am just trying to see if i understand the lucene query below correctly.

+(+contentNew:radio +contentNew:mp3) +entity:product +(name:radio 
mp3^4.0 (contentNew:radio contentNew:mp3) contentNew:radio mp3^2.0)


Let me see if can understand the above query correctly:

1. the contentNew field has the word radio AND the word mp3
AND
2. the entity field has the word product
AND
3. the phrase radio mp3 is in field name boosted by 4 OR the word 
radio is in the field contentNew OR the word mp3 is in the field 
contentNew OR the phrase radio mp3 is in the field contentNew boosted by 2


(I am trying to understand the above query in terms of ANDs, ORs, 
Groupings and boosting as opposed to prohibited and required)


Am I correct in my understanding?

Thanks,





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Strange Problem ... Luke returns results Lucene api does not.

2006-02-16 Thread Mufaddal Khumri

Hi,

I have a query that gets hits via luke. I can see the documents it 
finds. But when I run the same query via my java code it returns 0 hits.


Note:
1. I am using standard analyzer while indexing and searching.
2. I have made sure that I am querying the same index via luke or 
through my java program.


This is the call I make in my java code.
   BooleanQuery finalQuery = new BooleanQuery();
   .
   .
   log.debug(finalQuery.toString());

   hits = 
IndexSearcherManager.getIndexSearcher(indexPath).search(finalQuery);
  


   log.debug(Hits length =  + hits.length());

The output of the first log statement above is:

+(+contentNew:Wireless +contentNew:fm +contentNew:car 
+contentNew:transmitter) +entity:category +(name:Wireless fm car 
transmitter^40.0 ((name:Wireless name:fm name:car 
name:transmitter)^10.0) contentNew:Wireless fm car transmitter^20.0 
(contentNew:Wireless contentNew:fm contentNew:car contentNew:transmitter))


The output of the second log statement above is:

Hits length = 0

I run the above query against the same index via Luke and I get search 
results that I expected.


Any ideas as to why my java call does not return any hits? how i might 
be able to debug this?


Thanks,


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Strange Problem ... Luke returns results Lucene api does not.

2006-02-16 Thread Mufaddal Khumri

I am using the standard analyzer with luke.

Standard analyzer lower cases while indexing and searching.

The BooleanQuery, finalQuery.toString() in my case below is:

+(+contentNew:wireless +contentNew:fm +contentNew:car 
+contentNew:transmitter) +entity:product +(name:wireless fm car 
transmitter^40.0 ((name:wireless name:fm name:car 
name:transmitter)^10.0) contentNew:wireless fm car transmitter^20.0 
(contentNew:wireless contentNew:fm contentNew:car contentNew:transmitter))


OR

+(+contentNew:Wireless +contentNew:fm +contentNew:car  
+contentNew:transmitter) +entity:category +(name:Wireless fm car  
transmitter^40.0 ((name:Wireless name:fm name:car name:transmitter) 
^10.0) contentNew:Wireless fm car transmitter^20.0  (contentNew:Wireless 
contentNew:fm contentNew:car  contentNew:transmitter))


work in Luke just fine. I am using the StandardAnalyzer in Luke.

But when i try to execute the above boolean query via a call to 
IndexSearcher.search(finalQuery) it returns no hits.


Erik Hatcher wrote:

How are you constructing your BooleanQuery and what Analyzer are you  
using with Luke?   You have some capitalized words in your query, and  
most analyzers would lowercase those, which may be the issue (perhaps  
you indexed the capitalized words?).


Erik

On Feb 16, 2006, at 2:41 PM, Mufaddal Khumri wrote:


Hi,

I have a query that gets hits via luke. I can see the documents it  
finds. But when I run the same query via my java code it returns 0  
hits.


Note:
1. I am using standard analyzer while indexing and searching.
2. I have made sure that I am querying the same index via luke or  
through my java program.


This is the call I make in my java code.
   BooleanQuery finalQuery = new BooleanQuery();
   .
   .
   log.debug(finalQuery.toString());

   hits = IndexSearcherManager.getIndexSearcher 
(indexPath).search(finalQuery);

   log.debug(Hits length =  + hits.length());

The output of the first log statement above is:

+(+contentNew:Wireless +contentNew:fm +contentNew:car  
+contentNew:transmitter) +entity:category +(name:Wireless fm car  
transmitter^40.0 ((name:Wireless name:fm name:car name:transmitter) 
^10.0) contentNew:Wireless fm car transmitter^20.0  
(contentNew:Wireless contentNew:fm contentNew:car  
contentNew:transmitter))


The output of the second log statement above is:

Hits length = 0

I run the above query against the same index via Luke and I get  
search results that I expected.


Any ideas as to why my java call does not return any hits? how i  
might be able to debug this?


Thanks,


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Strange Problem ... Luke returns results Lucene api does not.

2006-02-16 Thread Mufaddal Khumri
Yes. thats exactly the problem. I just found out that analyzer was not 
being set correctly.


Thanks,

Chris Hostetter wrote:


: Standard analyzer lower cases while indexing and searching.

Correct, but since the toString() of your query still has capital words in
it (like contentNew:Wireless) you obviously didn't build this query
using the StandardAnalyzer -- IndexSearcher doesn't apply any Analyzers
for you when you search -- it's the responsability of whatever is
constructing your query (be that custom code you've written, or
QueryParser) to run the input thoguh the appropraite Analyzer.

when you paste that query into Luke, it *does* run it through the
QueryParser for you -- so the text gets analyzed and lower cased.



:
: The BooleanQuery, finalQuery.toString() in my case below is:
:
: +(+contentNew:wireless +contentNew:fm +contentNew:car
: +contentNew:transmitter) +entity:product +(name:wireless fm car
: transmitter^40.0 ((name:wireless name:fm name:car
: name:transmitter)^10.0) contentNew:wireless fm car transmitter^20.0
: (contentNew:wireless contentNew:fm contentNew:car contentNew:transmitter))
:
: OR
:
: +(+contentNew:Wireless +contentNew:fm +contentNew:car
: +contentNew:transmitter) +entity:category +(name:Wireless fm car
: transmitter^40.0 ((name:Wireless name:fm name:car name:transmitter)
: ^10.0) contentNew:Wireless fm car transmitter^20.0  (contentNew:Wireless
: contentNew:fm contentNew:car  contentNew:transmitter))
:
: work in Luke just fine. I am using the StandardAnalyzer in Luke.
:
: But when i try to execute the above boolean query via a call to
: IndexSearcher.search(finalQuery) it returns no hits.
:
: Erik Hatcher wrote:
:
:  How are you constructing your BooleanQuery and what Analyzer are you
:  using with Luke?   You have some capitalized words in your query, and
:  most analyzers would lowercase those, which may be the issue (perhaps
:  you indexed the capitalized words?).
: 
:  Erik
: 
:  On Feb 16, 2006, at 2:41 PM, Mufaddal Khumri wrote:
: 
:  Hi,
: 
:  I have a query that gets hits via luke. I can see the documents it
:  finds. But when I run the same query via my java code it returns 0
:  hits.
: 
:  Note:
:  1. I am using standard analyzer while indexing and searching.
:  2. I have made sure that I am querying the same index via luke or
:  through my java program.
: 
:  This is the call I make in my java code.
: BooleanQuery finalQuery = new BooleanQuery();
: .
: .
: log.debug(finalQuery.toString());
: 
: hits = IndexSearcherManager.getIndexSearcher
:  (indexPath).search(finalQuery);
: log.debug(Hits length =  + hits.length());
: 
:  The output of the first log statement above is:
: 
:  +(+contentNew:Wireless +contentNew:fm +contentNew:car
:  +contentNew:transmitter) +entity:category +(name:Wireless fm car
:  transmitter^40.0 ((name:Wireless name:fm name:car name:transmitter)
:  ^10.0) contentNew:Wireless fm car transmitter^20.0
:  (contentNew:Wireless contentNew:fm contentNew:car
:  contentNew:transmitter))
: 
:  The output of the second log statement above is:
: 
:  Hits length = 0
: 
:  I run the above query against the same index via Luke and I get
:  search results that I expected.
: 
:  Any ideas as to why my java call does not return any hits? how i
:  might be able to debug this?
: 
:  Thanks,
: 
: 
:  -
:  To unsubscribe, e-mail: [EMAIL PROTECTED]
:  For additional commands, e-mail: [EMAIL PROTECTED]
: 
: 
: 
:  -
:  To unsubscribe, e-mail: [EMAIL PROTECTED]
:  For additional commands, e-mail: [EMAIL PROTECTED]
: 
:
:
: -
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



de pluralization

2005-08-04 Thread Mufaddal Khumri
Hello,

I am just posting this question out here since this might be a common
problem and some of you might have good pointers.

Is there algorithms/api built into lucene that would help de pluralize
words while indexing and/or while searching the index? Are there
analyzers that do this already?

There is tons of academic work on going in this area and I was wondering
the best way to solve this problem. We have ideas and heuristics
ourseleves, but would love input from the community here since this
might be a common problem.

Any pointers/ideas on this?

Thank you,
Mufaddal.
 

--
This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity 
to whom they are addressed. If you have received this 
email in error please notify the system manager. Please
note that any views or opinions presented in this email 
are solely those of the author and do not necessarily
represent those of the company. Finally, the recipient
should check this email and any attachments for the 
presence of viruses. The company accepts no liability for
any damage caused by any virus transmitted by this email.
Consult your physician prior to the use of any medical
supplies or product.
--


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Question regarding boosting

2005-05-20 Thread Mufaddal Khumri
Hi,

After a little probing and trying I formulated this query:



queryString = entity:\ + en + \ AND (name:\ + queryString + \^2
OR content:\ + queryString + \);
Query q = QueryParser.parse(queryString, content, analyzer);



When I execute the above query, the following query gets executed in
lucene:

+entity:product +(name:audio cable^2.0 content:audio cable)

Note: audio cable is the contents of the search box. Also I saw that
my OR gets represented as a blank in the query. Is that fine?

The results from executing this query seem alright, but is this a good
way of achieving the results I was trying to achieve? (NOTE: My original
post explains what I am trying to do).

Any insight would be appreciated.

Mufaddal.


-Original Message-
From: Mufaddal Khumri [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 20, 2005 3:34 PM
To: java-user@lucene.apache.org
Subject: Question regarding boosting

Hi,

I wanted to know what method would be the best way to do something that
I am describing below.

I am creating an index of all my products and categories. While
indexing, I am creating the following documents for my products and
categories:

Product:
doc.add(Field.UnIndexed(id, (String)obj[0]));
doc.add(Field.Keyword(entity,product));
doc.add(Field.Text(name, name));
doc.add(Field.Text(content, content));

Category:
doc.add(Field.UnIndexed(id, (String)obj[0]));
doc.add(Field.Keyword(entity,category));
doc.add(Field.Text(name, name));
doc.add(Field.Text(content, content));

As you can see above the id is stored to retrieve the objects from the
database. The entity field distinguishes whether I want to carry out my
search on products or categories. The content field is a combination of
the name and description of the product and category. The name field is
the name of the product or the name of the category.

My searches and indexing works great.

This is how I am searching:

Query query1 =
QueryParser.parse(queryString,content,analyzer);

Term term = null;
if(entity.equals(product))
  term = new Term(entity,product);
else
if(entity.equals(category))
  term = new Term(entity,category);
  
TermQuery query2 = new TermQuery(term); 
BooleanQuery bq = new BooleanQuery();
bq.add(query1, true, false);
bq.add(query2, true, false);

return indexSearcher.search(bq);

As you can see above I am using the content and entity fields to do my
search and everything works fine. What I want to do now is that I want
to boost the results such that if the query matches the name field it
gives a higher rank. How do I do this?

For example adding something like this:
...
Query query3 = QueryParser.parse(queryString,name,analyzer);
query3.setBoost(2);
...
...
bq.add(query3, true, false);

When I do the above, I print a toString on my final Boolean query which
is:

+content:radio +entity:category +name:radio^2.0

When I am doing my search for products, lets say, how do I tell lucene
that - Show me all products such that the results are ordered in such a
way that if a product's name matches the querystring more it gets a
higher relevance

So the relevance should be in the following order:

1. Product name matches more - more relevance.
2. Product content matches - relevance is more but less than the
relevance given to product name in 1.

Any ideas?

Thanks.


--
This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity 
to whom they are addressed. If you have received this 
email in error please notify the system manager. Please
note that any views or opinions presented in this email 
are solely those of the author and do not necessarily
represent those of the company. Finally, the recipient
should check this email and any attachments for the 
presence of viruses. The company accepts no liability for
any damage caused by any virus transmitted by this email.
Consult your physician prior to the use of any medical
supplies or product.

--


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene loosing documents?

2005-04-28 Thread Mufaddal Khumri
Hi,

I am trying to index 20349 records. When I index using the FSDirectory I
get 20349 documents - this is correct. Now when I ude the RAMDirectory
to create my index and write all documents from the RAMDirectory to the
FSDirectory I only get 20340 documents consistently. This is the only
change I made. Why do I loose 9 documents?

int counter = 1;
while(counter = 20349)
{
ramWriter.addDocument(doc);
}

Directory d[] = {ramDir};
fsWriter.addIndexes(d);
fsWriter.optimize();
ramWriter.close();

fsWriter.close();

Any ideas as to why I am missing the 9 documents?

Thanks.

--
This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity 
to whom they are addressed. If you have received this 
email in error please notify the system manager. Please
note that any views or opinions presented in this email 
are solely those of the author and do not necessarily
represent those of the company. Finally, the recipient
should check this email and any attachments for the 
presence of viruses. The company accepts no liability for
any damage caused by any virus transmitted by this email.
Consult your physician prior to the use of any medical
supplies or product.
--


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene loosing documents?

2005-04-28 Thread Mufaddal Khumri
Hi,

Thanks. That seems to work. I guess calling the close before the add
causes the last few documents to be flushed out or something?

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 28, 2005 2:19 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene loosing documents?

Can you close the ramDirectory first and then add it via fsWriter and
see if that solves it?

Otis

--- Mufaddal Khumri [EMAIL PROTECTED] wrote:
 Hi,
 
 I am trying to index 20349 records. When I index using the
 FSDirectory I
 get 20349 documents - this is correct. Now when I ude the
 RAMDirectory
 to create my index and write all documents from the RAMDirectory to
 the
 FSDirectory I only get 20340 documents consistently. This is the only
 change I made. Why do I loose 9 documents?
 
 int counter = 1;
 while(counter = 20349)
 {
   ramWriter.addDocument(doc);
 }
 
 Directory d[] = {ramDir};
 fsWriter.addIndexes(d);
 fsWriter.optimize();
 ramWriter.close();
 
 fsWriter.close();
 
 Any ideas as to why I am missing the 9 documents?
 
 Thanks.
 


--
 This email and any files transmitted with it are confidential 
 and intended solely for the use of the individual or entity 
 to whom they are addressed. If you have received this 
 email in error please notify the system manager. Please
 note that any views or opinions presented in this email 
 are solely those of the author and do not necessarily
 represent those of the company. Finally, the recipient
 should check this email and any attachments for the 
 presence of viruses. The company accepts no liability for
 any damage caused by any virus transmitted by this email.
 Consult your physician prior to the use of any medical
 supplies or product.


--
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene bulk indexing

2005-04-19 Thread Mufaddal Khumri
Hi,

I am sure this question must be raised before and maybe it has been even
answered. I would be grateful, if someone could point me in the right
direction or give their thoughts on this topic.

The problem:

I have approximately over 2 products that I need to index. At the
moment I get X number of products at a time and index them. This process
takes about 26 minutes (Am indexing the database id, product name,
product description).

I was thinking of ways to make this indexing faster. For this I was
thinking about writing a threaded module that would index X number of
products simultaneously. For instance I could spawn (Number of
products/X) number of threads and do the indexing. I am guessing this
would be faster but by what factor would this be faster? (I understand
the writes to the index are synchronized by lucene).

Is there any other approach by which I could speed up the indexing?
Thoughts? Suggestions?

Thanks,
Mufaddal.


--
This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity 
to whom they are addressed. If you have received this 
email in error please notify the system manager. Please
note that any views or opinions presented in this email 
are solely those of the author and do not necessarily
represent those of the company. Finally, the recipient
should check this email and any attachments for the 
presence of viruses. The company accepts no liability for
any damage caused by any virus transmitted by this email.
Consult your physician prior to the use of any medical
supplies or product.
--


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]