indexing in lucene 1.9.1
Hi All, We have recently upgraded from lucene 1.4.3 to lucene 1.9.1 version. After the upgrade, we are facing some issues: 1. Indexing seems to be behaving differently. There were more than 300 segment files(.cfs) in the index and the IndexSearcher is taking forever to refresh the index. Have there been any changes in 1.9.1 wrt default values for merging segment files/ indexing? 2. Our application downloads documents and indexes them every min as a continuous process. So, we have a Quartz job that refreshes the Index Searcher every 4 hours. Would this have any effect on the indexing process/ add more no of segments? Any help would be appreciated. Thanks, Harini - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
should I avoid create many Fields for a Document?
Hello What is the best way to search? Should I separate all the fields, or create a big one that have all fields? Does this impact the performance dramatically? Creating a big field I would not need to create a BooleanQuery... last time I did not get any clues, lets see if this time will be better... thanks! -- Paulo E. A. Silveira Caelum Ensino e Soluções em Java http://www.caelum.com.br/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Need some Advice on Searching
On 19/05/06, Chris Hostetter [EMAIL PROTECTED] wrote: i assume when you say this... : 1. I need to temporarilly index sets of documents on the Fly say 100 at a : Time. you mean that you'll have lots of temporary indexes of a few hundrad documents and then you'll do a bunch of queries and throw the index away. Even if i'm wrong most of the rest of my advice will wtill be usefull, but its' good to clarify. Correct I will throw them away! : My problem is that for these queries I need to know which Documents hit. I : also need to know which terms hit and if possible : the location of the hits for each term in the hit Document. knowing which docs match your is easy. knowing where in a document a particular term matches can be done using the TermPositions APIs ... but it does you that info as a number of terms which for HTML content may be confusing depending on how your analyzer deals with that HTML. Okay based on your answer and a little testing just to see what it gives me - I assume Lucene only stores the Term Offset (which is Analyser Dependent) and not the Actual Offset as retrieved from the Plain Text Stream for the Term. if you have complex boolean queries and you need to know which individual pat of the query matched that's not really trivial. you didn't mention anything about score or relevancy in your email, so i'm guessing all you care about is boolean did it match or not logic .. in that case using Filters directly (without ever searching) is your friend. You can build a Filter for each individual clause, intersect/union the bitsets to get the final set of matching documents for your whole query, but inspect the individual bitsets to know he specifics about which ones match which documents. Score/Relavence is not Important. I need the Yes/No logic with the what caused the Match Info. Could you mayby explain the intersect/union the bitsets and the interogating to know what matched? some people don't like Filters because of how much space they take up for really large indexes, but if you've only got 100 docs ... there's no reason not to use them Nope will never have any really large Indexes here 100 to 200 docs at the most. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Thanx for the Relpy much appreciated.
Searching API: QueryParser vs Programatic queries
Hi, Im very new to Lucene - so sorry if my question seems pretty dumb. In the application Im writing, I've been struggling with myself over whether I should be building up queries programatically, or using the Query Parser. My searchable fields are driven by meta-data, and I only want to support a few query types. It seems cleaner to build the queries up programatically rather than converting the query to a string and throwing it through the QueryParser. However, then we hit the problem that the QueryParser takes care of Analysing the search strings - so to do this we'd have to write some utility stuff to perform the analysis as we're building up the queries / terms. And then I think might as well just use the QueryParser!. So here's what Im wondering (which probably sounds very dumb to experienced Lucene'rs): - Is there maybe some room for more utility classes in Lucene which make this easier? E.g: When building up a document, we don't have to worry about running content through an analyser - but unless we use QueryParser, there doesn't seem to be corresponding behaviour on the search side. - So, Im thinking some kind of factory / builder or something, where you can register an Analyser (possibly a per field wrapper), and then it is applied per field as the query is being built up programatically. Maybe this is just an extraction refactoring to take this behaviour out of QueryParser (which could delegate to it). The result could be that more users opt for a programatic build up of queries (because it's become easier to do..) rather than falling back on QueryParser in cases where it may not be the best choice. Sorry if I rambled too much :o) Dave This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: indexing in lucene 1.9.1
Hi Mike, Yes you are right, when we run the optimize(), it creates one large segment file and makes the searching faster. But the issue is our index keeps growing every minute as we download documents add to the index, so we cannot call optimize so often. The indexing seemed to be fine till we migrated to lucene 1.9.1. I just compared the IndexWriter classes in 1.4.3 and 1.9.1 versions and found that there are some changes wrt to creating new segments. Any idea if that has impacted indexing? Has anyone else faced a similar issue with the new version of lucene? -Harini Mike Richmond wrote: Hello Harini, When you are finished indexing the documents are you running the optimize() method on the IndexWriter before closing it? This should reduce the number of segments and make searching faster. Just a thought. --Mike On 5/22/06, Harini Raghavan [EMAIL PROTECTED] wrote: Hi All, We have recently upgraded from lucene 1.4.3 to lucene 1.9.1 version. After the upgrade, we are facing some issues: 1. Indexing seems to be behaving differently. There were more than 300 segment files(.cfs) in the index and the IndexSearcher is taking forever to refresh the index. Have there been any changes in 1.9.1 wrt default values for merging segment files/ indexing? 2. Our application downloads documents and indexes them every min as a continuous process. So, we have a Quartz job that refreshes the Index Searcher every 4 hours. Would this have any effect on the indexing process/ add more no of segments? Any help would be appreciated. Thanks, Harini - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Aggregating category hits
Hi Jelda, Is there any way by which I can achieve sorting of search results along with overriding the collect method of the HitCollector in this case? I have been using srch.search(query,sort); If I replace it with srch.search(query, new HitCollector(){ impl of the collect method to collect counts }), I will have no way to sort my results. Any pointers? Regards, kapilChhabra Kapil Chhabra wrote: Thanks a lot Jelda. I'll try this get back with the performance comparison chart. Regards, kapilChhabra Ramana Jelda wrote: Hi Kapil, As I remember FieldCache is in lucene api since 1.4 . Ok . Anyhow here is suedo code that can help. //1. initialize reader on opening documentId to the categoryid relation as below. Depending on your requirement you can either getStringIndex().. I get StringIndex in //my project. String[] docId2CategoryIdRelation=FieldCache.DEFAULT.getStrings(reader, categoryFieldName); //2. cache it //3. search as usal with your Query providing your own HitCollector //4. use docId2CategoryIdRelation to retrieve category id for each result document String yourCategoryId=docId2CategoryIdRelation[resultDocId] //5.Increment yourCategoryId count (do lazy initialization of categoryCounts holder.FAQ.) //6 You are done.. :) All the best, Jelda -Original Message- From: Kapil Chhabra [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 11:50 AM To: java-user@lucene.apache.org Subject: Re: Aggregating category hits Hi Jelda, I have not yet migrated to Lucene 1.9 and I guess FieldCache has been introduced in this release. Can you please give me a pointer to your strategy of FieldCache? Thanks Regards, Kapil Chhabra Ramana Jelda wrote: But this BitSet strategy is more memory consuming mainly if you have documents in million numbers and categories in thousands. So I preferred in my project FieldCache strategy. Jelda -Original Message- From: Kapil Chhabra [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 7:38 AM To: java-user@lucene.apache.org Subject: Re: Aggregating category hits Even I am doing the same in my application. Once in a day, all the filters [for different categories] are initialized. Each time a query is fired, the Query BitSet is ANDed with the BitSet of each filter. The cardinality obtained is the desired output. @Eric: I would like to know more about the implementation with DocSet in place of Bitset. Regards, kapilChhabra Erik Hatcher wrote: On May 15, 2006, at 5:07 PM, Marvin Humphrey wrote: If you needed to know not just the total number of hits, but the number of hits in each category, how would you handle that? For instance, a search for egg would have to produce the 20 most relevant documents for egg, but also a list like this: Holiday Seasonal / Easter 75 Books / Cooking 52 Miscellaneous 44 Kitchen Collectibles 43 Hobbies / Crafts 17 [...] It seems to me that you'd have to retrieve each hit's stored fields and examine the contents of a category field. That's a lot of overhead. Is there another way? My first implementation of faceted browsing uses BitSet's that get pre-loaded for each category value (each unique term in a category field, for example). And to intersect that with an actual Query, it gets run through the QueryFilter to get its BitSet and then AND'd together with each of the category BitSet's. Sounds like a lot, but for my applications there are not tons of these BitSet's and the performance has been outstanding. Now that I'm doing more with Solr, I'm beginning to leverage its amazing caching infrastructure and replacing BitSet's with DocSet's. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Aggregating category hits
I think, if you dig a little bit what lucene is when asked to do Sort then you will get the information what you are looking for. Here is some help. Lucene uses TopFieldDocCollector for sorting purpose(lookat implementation of IndexSearcher). So your HitCollector will extend this TopFieldDocCollector, so that you will do your work what ever you want to do and also let TopFieldDocCollector do its work (sorting..).I think I don't need to explain you more. Then you are done. Have fun, Jelda -Original Message- From: Kapil Chhabra [mailto:[EMAIL PROTECTED] Sent: Monday, May 22, 2006 2:07 AM To: java-user@lucene.apache.org Subject: Re: Aggregating category hits Hi Jelda, Is there any way by which I can achieve sorting of search results along with overriding the collect method of the HitCollector in this case? I have been using srch.search(query,sort); If I replace it with srch.search(query, new HitCollector(){ impl of the collect method to collect counts }), I will have no way to sort my results. Any pointers? Regards, kapilChhabra Kapil Chhabra wrote: Thanks a lot Jelda. I'll try this get back with the performance comparison chart. Regards, kapilChhabra Ramana Jelda wrote: Hi Kapil, As I remember FieldCache is in lucene api since 1.4 . Ok . Anyhow here is suedo code that can help. //1. initialize reader on opening documentId to the categoryid relation as below. Depending on your requirement you can either getStringIndex().. I get StringIndex in //my project. String[] docId2CategoryIdRelation=FieldCache.DEFAULT.getStrings(reader, categoryFieldName); //2. cache it //3. search as usal with your Query providing your own HitCollector //4. use docId2CategoryIdRelation to retrieve category id for each result document String yourCategoryId=docId2CategoryIdRelation[resultDocId] //5.Increment yourCategoryId count (do lazy initialization of categoryCounts holder.FAQ.) //6 You are done.. :) All the best, Jelda -Original Message- From: Kapil Chhabra [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 11:50 AM To: java-user@lucene.apache.org Subject: Re: Aggregating category hits Hi Jelda, I have not yet migrated to Lucene 1.9 and I guess FieldCache has been introduced in this release. Can you please give me a pointer to your strategy of FieldCache? Thanks Regards, Kapil Chhabra Ramana Jelda wrote: But this BitSet strategy is more memory consuming mainly if you have documents in million numbers and categories in thousands. So I preferred in my project FieldCache strategy. Jelda -Original Message- From: Kapil Chhabra [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 7:38 AM To: java-user@lucene.apache.org Subject: Re: Aggregating category hits Even I am doing the same in my application. Once in a day, all the filters [for different categories] are initialized. Each time a query is fired, the Query BitSet is ANDed with the BitSet of each filter. The cardinality obtained is the desired output. @Eric: I would like to know more about the implementation with DocSet in place of Bitset. Regards, kapilChhabra Erik Hatcher wrote: On May 15, 2006, at 5:07 PM, Marvin Humphrey wrote: If you needed to know not just the total number of hits, but the number of hits in each category, how would you handle that? For instance, a search for egg would have to produce the 20 most relevant documents for egg, but also a list like this: Holiday Seasonal / Easter 75 Books / Cooking 52 Miscellaneous 44 Kitchen Collectibles 43 Hobbies / Crafts 17 [...] It seems to me that you'd have to retrieve each hit's stored fields and examine the contents of a category field. That's a lot of overhead. Is there another way? My first implementation of faceted browsing uses BitSet's that get pre-loaded for each category value (each unique term in a category field, for example). And to intersect that with an actual Query, it gets run through the QueryFilter to get its BitSet and then AND'd together with each of the category BitSet's. Sounds like a lot, but for my applications there are not tons of these BitSet's and the performance has been outstanding. Now that I'm doing more with Solr, I'm beginning to leverage its amazing caching infrastructure and replacing BitSet's with DocSet's. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional
Re: OutOfMemory and IOException Access Denied errors
Your out of memory error is likely due to a mysql bug outlined here: http://bugs.mysql.com/bug.php?id=7698 Thanks for the article. My query executed in no time without any errors !!! The MySQL drivers are horrible at dealing with large result sets - that article gives you the workaround to tell it to bring the results back as they are needed (like it should in the first place) but I have found that it isn't reliable - it tends to drop out at random points during the query - so you will get a different number of rows each time you rerun the query. In MySQL - the only reliable way I have found to get all of the results from a large table is to use their limit keyword in the query, and only ask it for X (I usually use 10,000, but use whatever works best with your system) number of rows as a time, and then keep rerunning the query, incrementing up the start position of the limit keyword. This issue also varies a lot from version to version of the driver - some versions have been completely broken, and others are only slightly broken. To bad we can't get lucene quality code everywhere :) Exception in thread main java.io.IOException: Access is denied To me, that really seems like you have an issue with the location that you are writing the index to. I would make sure you have full write permissions to the location, and make sure there aren't some old / invalid files sitting in there. Dan -- Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is more efficient?
The usual answer: it depends :) Over on http://www.simpy.com I have similar functionality (groups), and I have them as separate indices. If you want to be able to reindex individual groups separately, you;ll want them in separate groups. If groups in aggregate will get very large, perhaps keeping them separate is more scalable. If you want to distribute groups over multiple servers, keep them separate. If they are heterogeneous (different fields), this may be another reason to keep them separate. etc. Of course, if some of the above don't hold or are not a requirement, a single group may be the way to go for you. Otis - Original Message From: Dan Wiggin [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Monday, May 22, 2006 6:03:25 AM Subject: What is more efficient? If I work with groups, whats the best option do do? Use a multiple lucene index for every group or is bettter an unique index. For example: I'm working with groups of people, and the action to add or delete is in group level but the search is on all groups. What do you think is the best implementation in lucene? I have any number of index limitation in Multisearcher? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: should I avoid create many Fields for a Document?
Uh, another it depends answer. Some people prefer one aggregate field, others do not. If you care about field normalization (shorter fields with matches in them shoring higher than longer fields with equal number of matches in them), I'd say keep them separate. If you want to boost individual fields differently at search time, keep them separate. Over at http://www.simpy.com/ I tend to keep fields separate. Some of the fields that indices at Simpy have are: title, tags, url, etc. When a user performs a search I can use MultiFieldQueryParser and soon I'll be able to boost these fields differently (e.g. crowd-supplied tags may get a boost over web page author-supplied titles). Also, I probably don't care about the URL length, so I don't need normalization there. That saves some RAM and doesn't hurt scoring. Otis - Original Message From: Paulo Silveira [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Monday, May 22, 2006 2:08:24 AM Subject: should I avoid create many Fields for a Document? Hello What is the best way to search? Should I separate all the fields, or create a big one that have all fields? Does this impact the performance dramatically? Creating a big field I would not need to create a BooleanQuery... last time I did not get any clues, lets see if this time will be better... thanks! -- Paulo E. A. Silveira Caelum Ensino e Soluções em Java http://www.caelum.com.br/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Performance ...
Hi, The search results of my Lucene application are always sorted alphabetically. Therefore, score and relevance are not needed. With that said, is there anything that I can disable to: (a) Improve the search performance (b) Reduce the size of the index (c) Shorten the indexing time Thank you. _ Dont just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching API: QueryParser vs Programatic queries
Dave, You said you are new to Lucene and you didn't mention this class explicitly, so you may not be aware of it yet: PerFieldAnalyzerWrapper. It sounds like this may be what you are after. Otis - Original Message From: Irving, Dave [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Monday, May 22, 2006 5:15:23 AM Subject: Searching API: QueryParser vs Programatic queries Hi, Im very new to Lucene - so sorry if my question seems pretty dumb. In the application Im writing, I've been struggling with myself over whether I should be building up queries programatically, or using the Query Parser. My searchable fields are driven by meta-data, and I only want to support a few query types. It seems cleaner to build the queries up programatically rather than converting the query to a string and throwing it through the QueryParser. However, then we hit the problem that the QueryParser takes care of Analysing the search strings - so to do this we'd have to write some utility stuff to perform the analysis as we're building up the queries / terms. And then I think might as well just use the QueryParser!. So here's what Im wondering (which probably sounds very dumb to experienced Lucene'rs): - Is there maybe some room for more utility classes in Lucene which make this easier? E.g: When building up a document, we don't have to worry about running content through an analyser - but unless we use QueryParser, there doesn't seem to be corresponding behaviour on the search side. - So, Im thinking some kind of factory / builder or something, where you can register an Analyser (possibly a per field wrapper), and then it is applied per field as the query is being built up programatically. Maybe this is just an extraction refactoring to take this behaviour out of QueryParser (which could delegate to it). The result could be that more users opt for a programatic build up of queries (because it's become easier to do..) rather than falling back on QueryParser in cases where it may not be the best choice. Sorry if I rambled too much :o) Dave This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Searching API: QueryParser vs Programatic queries
Hi Otis, Thanks for your reply. Yeah, Im aware of PerFieldAnalyserWrapper - and I think it could help in the solution - but not on its own. Here's what I mean: When we build a document Field, we suppy either a String or a Reader. The framework takes care of running the contents through an Analyser (per field or otherwise) when we add the document to an index. However, on the searching side of things, we don't have similar functionality unless we use the QueryParser. If we build queries programatically, then we have to make sure (by hand) that we run search terms through the appropriate analyser whilst constructing the query. Its in this area that I wonder whether additional utility classes could make programatic construction of queries somewhat easier. Dave -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: 22 May 2006 15:59 To: java-user@lucene.apache.org Subject: Re: Searching API: QueryParser vs Programatic queries Dave, You said you are new to Lucene and you didn't mention this class explicitly, so you may not be aware of it yet: PerFieldAnalyzerWrapper. It sounds like this may be what you are after. Otis - Original Message From: Irving, Dave [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Monday, May 22, 2006 5:15:23 AM Subject: Searching API: QueryParser vs Programatic queries Hi, Im very new to Lucene - so sorry if my question seems pretty dumb. In the application Im writing, I've been struggling with myself over whether I should be building up queries programatically, or using the Query Parser. My searchable fields are driven by meta-data, and I only want to support a few query types. It seems cleaner to build the queries up programatically rather than converting the query to a string and throwing it through the QueryParser. However, then we hit the problem that the QueryParser takes care of Analysing the search strings - so to do this we'd have to write some utility stuff to perform the analysis as we're building up the queries / terms. And then I think might as well just use the QueryParser!. So here's what Im wondering (which probably sounds very dumb to experienced Lucene'rs): - Is there maybe some room for more utility classes in Lucene which make this easier? E.g: When building up a document, we don't have to worry about running content through an analyser - but unless we use QueryParser, there doesn't seem to be corresponding behaviour on the search side. - So, Im thinking some kind of factory / builder or something, where you can register an Analyser (possibly a per field wrapper), and then it is applied per field as the query is being built up programatically. Maybe this is just an extraction refactoring to take this behaviour out of QueryParser (which could delegate to it). The result could be that more users opt for a programatic build up of queries (because it's become easier to do..) rather than falling back on QueryParser in cases where it may not be the best choice. Sorry if I rambled too much :o) Dave This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching API: QueryParser vs Programatic queries
If i understand correctly, is it that you dont want to make use of query parse? You need to parse a query string without using query parser and construct the query and still want an analyzer applied on the outcome search. On 5/22/0 p6, Irving, Dave [EMAIL PROTECTED] wrote: Hi Otis, Thanks for your reply. Yeah, Im aware of PerFieldAnalyserWrapper - and I think it could help in the solution - but not on its own. Here's what I mean: When we build a document Field, we suppy either a String or a Reader. The framework takes care of running the contents through an Analyser (per field or otherwise) when we add the document to an index. However, on the searching side of things, we don't have similar functionality unless we use the QueryParser. If we build queries programatically, then we have to make sure (by hand) that we run search terms through the appropriate analyser whilst constructing the query. Its in this area that I wonder whether additional utility classes could make programatic construction of queries somewhat easier. Dave -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: 22 May 2006 15:59 To: java-user@lucene.apache.org Subject: Re: Searching API: QueryParser vs Programatic queries Dave, You said you are new to Lucene and you didn't mention this class explicitly, so you may not be aware of it yet: PerFieldAnalyzerWrapper. It sounds like this may be what you are after. Otis - Original Message From: Irving, Dave [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Monday, May 22, 2006 5:15:23 AM Subject: Searching API: QueryParser vs Programatic queries Hi, Im very new to Lucene - so sorry if my question seems pretty dumb. In the application Im writing, I've been struggling with myself over whether I should be building up queries programatically, or using the Query Parser. My searchable fields are driven by meta-data, and I only want to support a few query types. It seems cleaner to build the queries up programatically rather than converting the query to a string and throwing it through the QueryParser. However, then we hit the problem that the QueryParser takes care of Analysing the search strings - so to do this we'd have to write some utility stuff to perform the analysis as we're building up the queries / terms. And then I think might as well just use the QueryParser!. So here's what Im wondering (which probably sounds very dumb to experienced Lucene'rs): - Is there maybe some room for more utility classes in Lucene which make this easier? E.g: When building up a document, we don't have to worry about running content through an analyser - but unless we use QueryParser, there doesn't seem to be corresponding behaviour on the search side. - So, Im thinking some kind of factory / builder or something, where you can register an Analyser (possibly a per field wrapper), and then it is applied per field as the query is being built up programatically. Maybe this is just an extraction refactoring to take this behaviour out of QueryParser (which could delegate to it). The result could be that more users opt for a programatic build up of queries (because it's become easier to do..) rather than falling back on QueryParser in cases where it may not be the best choice. Sorry if I rambled too much :o) Dave This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Searching API: QueryParser vs Programatic queries
You need to parse a query string without using query parser and construct the query and still want an analyzer applied on the outcome search Not quite. The user is presented with a list of (UI) fields, and each field already knows whether its an OR AND etc. So, there is no query String as such. For this reason, it seems to make more sense to build the query up programmatically - as my field meta data can drive this. However, if I do that, I have to do the work of extracting terms by running through an analyser for each field manually. This is also done by the query parser. So, right now, if Im being lazy, the easiest thing to do is construct a query string based on the meta data, and then run that through the query parser. This just doesn't -- feel right -- from a design perspective though :o) The logic I could see being extracted out would be some of the stuff in QueryParser#getFieldQuery(String field, String queryText). -Original Message- From: Raghavendra Prabhu [mailto:[EMAIL PROTECTED] Sent: 22 May 2006 16:17 To: java-user@lucene.apache.org Subject: Re: Searching API: QueryParser vs Programatic queries If i understand correctly, is it that you dont want to make use of query parse? You need to parse a query string without using query parser and construct the query and still want an analyzer applied on the outcome search. On 5/22/0 p6, Irving, Dave [EMAIL PROTECTED] wrote: Hi Otis, Thanks for your reply. Yeah, Im aware of PerFieldAnalyserWrapper - and I think it could help in the solution - but not on its own. Here's what I mean: When we build a document Field, we suppy either a String or a Reader. The framework takes care of running the contents through an Analyser (per field or otherwise) when we add the document to an index. However, on the searching side of things, we don't have similar functionality unless we use the QueryParser. If we build queries programatically, then we have to make sure (by hand) that we run search terms through the appropriate analyser whilst constructing the query. Its in this area that I wonder whether additional utility classes could make programatic construction of queries somewhat easier. Dave -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: 22 May 2006 15:59 To: java-user@lucene.apache.org Subject: Re: Searching API: QueryParser vs Programatic queries Dave, You said you are new to Lucene and you didn't mention this class explicitly, so you may not be aware of it yet: PerFieldAnalyzerWrapper. It sounds like this may be what you are after. Otis - Original Message From: Irving, Dave [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Monday, May 22, 2006 5:15:23 AM Subject: Searching API: QueryParser vs Programatic queries Hi, Im very new to Lucene - so sorry if my question seems pretty dumb. In the application Im writing, I've been struggling with myself over whether I should be building up queries programatically, or using the Query Parser. My searchable fields are driven by meta-data, and I only want to support a few query types. It seems cleaner to build the queries up programatically rather than converting the query to a string and throwing it through the QueryParser. However, then we hit the problem that the QueryParser takes care of Analysing the search strings - so to do this we'd have to write some utility stuff to perform the analysis as we're building up the queries / terms. And then I think might as well just use the QueryParser!. So here's what Im wondering (which probably sounds very dumb to experienced Lucene'rs): - Is there maybe some room for more utility classes in Lucene which make this easier? E.g: When building up a document, we don't have to worry about running content through an analyser - but unless we use QueryParser, there doesn't seem to be corresponding behaviour on the search side. - So, Im thinking some kind of factory / builder or something, where you can register an Analyser (possibly a per field wrapper), and then it is applied per field as the query is being built up programatically. Maybe this is just an extraction refactoring to take this behaviour out of QueryParser (which could delegate to it). The result could be that more users opt for a programatic build up of queries (because it's become easier to do..) rather than falling back on QueryParser in cases where it may not be the best choice. Sorry if I rambled too much :o) Dave This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied,
Re: Searching API: QueryParser vs Programatic queries
On May 22, 2006, at 8:44 AM, Irving, Dave wrote: So, right now, if Im being lazy, the easiest thing to do is construct a query string based on the meta data, and then run that through the query parser. This just doesn't -- feel right -- from a design perspective though :o) How about building a larger BooleanQuery by combining the output of the QueryParser with custom-built Query objects based on your metadata? Marvin Humphrey Rectangular Research http://www.rectangular.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching API: QueryParser vs Programatic queries
At 10:15 AM +0100 5/22/06, Irving, Dave wrote: - Is there maybe some room for more utility classes in Lucene which make this easier? E.g: When building up a document, we don't have to worry about running content through an analyser - but unless we use QueryParser, there doesn't seem to be corresponding behaviour on the search side. - So, Im thinking some kind of factory / builder or something, where you can register an Analyser (possibly a per field wrapper), and then it is applied per field as the query is being built up programatically. Maybe this is just an extraction refactoring to take this behaviour out of QueryParser (which could delegate to it). The result could be that more users opt for a programatic build up of queries (because it's become easier to do..) rather than falling back on QueryParser in cases where it may not be the best choice. I concur with your thoughts that there is room for such utility classes, and that those would increase the use of programmatic queries. I say this as a developer who also lazed out and opted to simply construct a string and let the QP do all the work (but who then had to subclass and finally copy-and-modify QP to make it conform to requirements). The underlying issue may be that there are two quite different concerns bundled into QueryParser: - Parsing a string into a set of discrete query requests - Constructing Query objects to meet those requests If you take a look at http://issues.apache.org/jira/browse/LUCENE-344 you'll see that someone else (Matthew Denner) also had this belief, and went so far as to implement a QueryFactory interface and a couple of implementing classes. One has the construction logic now found in QueryParser. Then there is a decorator class which adds the functionality of MultiFieldQueryParser and another which lower-cases terms. Perhaps something along those lines that should be considered for the next break in API continuity eg. Lucene 2.0. It seems much cleaner than subclassing QP when all that is needed is a variant in Query construction logic, and it also provides a higher-level interface for constructing Query objects (especially TermQuery) like you were proposing. Unfortunately the actual LUCENE-344 patch appears out of date with changes in QueryParser, MultiFieldQueryParser, etc. But perhaps just the QueryFactory part would be a good starting point for what you want to do. Anyway, just a thought. - J.J. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Searching missing documents after doing an addIndexes
I am using 1.9.1(java). I am trying to add documents to an existing index that may or may not exist. I use a RAMDirectory to build a temp index that is later merged. Before adding a new document, I search the existing index (using unique key) to see if it is there. If not, I add it. In reading the documentation, I understood that I can search while an index is being updated. It was not clear if that search would include recently added items. I had assumed it would. However, it appears to not find them unless I close and re-open the searcher... The net result is that I get duplicate documents as the search does not find the document if I recently added it. Note that the duplicate CANNOT come from the RAMDirectory (i.e., getNewDocuments), as it is guaranteed to have no duplicates in it. The search is failing to find documents that have been recently added via a addIndexes. Can anyone clarify this behavior, i.e., why does search not find recently added documents unless I close and re-open it? I have code that does roughly the following: RAMDirectory added = new RAMDirectory(...) IndexWriter writer = new IndexWriter(the main index); IndexWriter ramWriter = new IndexWriter(added) IndexSearcher searcher = new IndexSearcher(...) for (Document d : getNewDocuments()) { ... build a query if (searcher.search(...) == 0) { // doesn't exist, so we can add it } if (timeToMerge) { writer.addIndexes(new Directory[] {added}); added.close();se(); added = new RAMDirectory(); ramWriter.close(); ramWriter = new IndexWriter(added, new StandardAnalyzer(), true); // for some reason the searcher won't see the new indices // unless the following two lines are here searcher.close(); searcher = new IndexSearcher(current.getDirectory()); } } Jim Wilson Colorado Springs, CO 719-266-4431 (Home) 719-661-6768 (Cell) [EMAIL PROTECTED] IM:jwilsonsprings Registered Linux User # 302849 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: does anybody have the experience to do some pooling upon lucene?
On May 21, 2006, at 10:56 PM, Zhenjian YU wrote: I didn't dig the source code of lucence deep enough, but I noticed that the IndexSearcher uses an IndexReader, while the cost of initializing IndexReader is a bit high. The key is the IndexReader. My application is a webapp, so I think it may be good if I cache some instances of IndexSearcher to provide service for my webapp. I haven't done any performance testing yet. Maybe I test it later to see the difference between caching and without caching. It is best to keep only a single IndexSearcher/IndexReader combination around. There is no need to have more than one instance, and in fact it is a waste of resources to do so. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching missing documents after doing an addIndexes
: Can anyone clarify this behavior, i.e., why does search not find : recently added documents unless I close and re-open it? this is by design .. an IndexReader (and hence an IndexSearcher) maintain consistent views of the index at the moment they were open by hanging on to the open filehandles and segment information. no changes made to the index after they've been open ever show up in that instance (but they will show up in other instances you open after those changes. the two main reasons for this behavior that i know of are: 1) it gives yo ua consistent view for as long as you want it -- you can choose when you get to see updates. 2) it allows the IndexReader to maintain caches of information (like the FieldCache and CachingWrapperFilter for example .. i'm sure there are other peices of information that get cached, but i can't think of specifics off the top of my head) -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
incremental updates
I'm pretty new to lucene and was wondering if there are any resources on how to do incremental updates in lucene. Thanks! Van Nguyen Wynne Systems, Inc. 19800 MacArthur Blvd., Suite 900 Irvine, CA 92612-2421 949.224.6300 ext 223 949.225.6540 (fax) 866.901.9284 (toll-free) www.wynnesystems.com blocked::blocked::blocked::blocked::http://www.wynnesystems.com/ This communication and any documents, files, or previous electronic mail messages attached to it constitute an electronic communication within the scope of the Electronic Communication Privacy Act, 18 USCA 2510. This communication may contain non-public, confidential, or legally privileged information intended for the sole use of the designated recipient(s). The unlawful interception, use or disclosure of such information is strictly prohibited under 18 USCA 2511 and any applicable laws.
Re: Searching API: QueryParser vs Programatic queries
There's a long scree that I'm leaving at the bottom because I put effort into it and I like to rant. But here's, perhaps, an approach. Maybe I'm mis-interpreting what you're trying to do. I'm assuming that you have several search fields (I'm not exactly sure what driven by meta-data means in this case, but what the heck). It seems to me that you can always do something like: BooleanQuery bq; QueryParser qp1 = new QueryParser(field1, your query fragment here, analyzer); Query q1 = qp1.parse(search term or clause); bq.add(q1,,,); QueryParser qp2 = new QueryParser(field2, your query fragment here, analyzer); Query q2 = qp2.parse(search term or clause); bq.add(q2); . . . and eventually submit the query you've build up in bq. You can arbitrarily build these up. In other words, your q1, q2, q3, etc can be the same field for the first N clauses, and another field for the second M clauses. Or you could build up the query fragment to consist of all the terms for a particular field. As I said, I have no clue whether this is possible in your application. If not, see below G. Scree starts here*** I've had similar arguments with myself. But I'm getting less forgiving with myself when I reinvent wheels, and firmly slap my own wrists. Pretend you are talking to your boss/technical lead/coworker. I'm assuming you actually want to get a product out the door. Your manager asks: How can you justify spending the time to create, debug and maintain code that has already been written for you for the sake of cleanliness at the expense of the other things you could be contributing instead? There are some very good answers to this, but most of the ones I've tried to use involve a lot of hand-waving on the order of If we ever extend the application.., or It would be cleaner. At which point the conversation *should* go something like this Manager: let me get this straight. You can spend 10 minutes right now implementing the pass-to-the-query-parser solution and an unknown amount (but probably way more than your initial estimate) implementing/debugging/testing a 'cleaner' solution. Is that right? You: Yes but. Manager: Furthermore, the functionality you want to add is *already* built into the 'use-the-parser' solution, right? You: Yes, but Manager: And the amount of time you'll spend debugging this, not to mention the amount of *other* people's time you'll spend identifying any bugs and figuring out that it's in this new code will only increase as the longer any bugs to undetected, right? You: Yes, but... Manager: Do it the use-the-parser way. We can always implement it the other way if we have time. It doesn't cost us *any* time to implement the 'use the query parser' way, whereas your way has a measurable cost now, an unknown cost in the future and no measurable gain. Add a big comment if you want about how I forced you to do this ugly thing.. Of course there are good reasons to take the time now *if* it will save time/effort in the future. But this sure doesn't seem like one of those situations to me. Not to mention that it'll be MUCH simpler for the next person looking at it to understand. Here are several things off the top of my head that'll become maintenance issues for a custom solution, that are *all* taken care of by the use-the-parser solution 1 How are you going to handle stop words? 2 Will you ever want to change analyzers to, say, keep URLs together? Or maybe break them up? 3 What happens if you want to use the RegularExpressionAnalyzer to, say, remove all punctuation or other user-entered junk? 4 Will you remember all the ins-and-outs of this code in even 1 month? What about the next poor joker who has to figure it out? None of this is to say that your suggestion that there be utility classes that allow this sort of thing doesn't have merit. But I have to wonder whether it would be effort well spent for you at this time, in this project G. As you can see, this is one of my hot-button issues G. If you want to really see me go off the deep end, just *mention* premature optimizations Best Erick
Checking for duplicates inside index
Hi All, I'm indexing ~1 documents per day but since I'm getting a lot of real duplicates (100% the same document content) I want to check the content before indexing... My idea is to create a checksum of the documents content and store it within document inside the index, before indexing a new document I will compare the new documents checksum with the ones in the index. Is that a good idea? does someone have experiences with that method? any tools available? Thank you and kind regards Hannes - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Searching API: QueryParser vs Programatic queries
: Not quite. The user is presented with a list of (UI) fields, and each : field already knows whether its an OR AND etc. : So, there is no query String as such. : For this reason, it seems to make more sense to build the query up : programmatically - as my field meta data can drive this. : However, if I do that, I have to do the work of extracting terms by : running through an analyser for each field manually. : This is also done by the query parser. typically, when build queries up from form data, each piece of data falls into one of 2 categories: 1) data which doesn't need analyzed because the field it's going to query on wasn't tokenized (ie: a date field, or a numeric field, or a boolean field) 2) data whcih is typed by the user in a text box, and not only needs analyzed, but may also need some parsing (ie: to support quoted phrases or +mandatory and -prohibited terms) in the first case, build your query clauses progromatically. in the second case make a QueryParser on the fly with the defaultField set to whatever makes sense and let it handle parsing the users text (and applying hte correct analyzer using PerFieldAnalyzer. if there are special characters you want it to ignore, then escape them first. i discussed this a little more recently... http://www.nabble.com/RE%3A+Building+queries-t1635907.html#a4436416 -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How are results merged from a multisearcher?
Tom Emerson wrote: Thanks for the clarification. What then is the difference between a MultiSearcher and using an IndexSearcher on a MultiReader? The results should be identical. A MultiSearcher permits use of ParallelMultiSearcher and RemoteSearchable, for parallel and/or distributed operation. But, for single-threaded searching, a MultiReader is probably fastest. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Changing the scoring (newest doc date first)
Marcus Falck wrote: There is however one LARGE problem that we have run into. All search result should be displayed sorted with the newest document at top. We tried to accomplish this using Lucene's sort capabilites but quickly ran into large performance bottlenecks. So i figured since the default sort is by relevance i would like to change the relevance so that we don't even need to sort the documents. I guess alot of people at this mail list can give me valuable hints about how to accomplish this! (Since i now about the ability to sort by index id (which i haven't tried) I can also add that i can't guarantee that all documents will be added in correct date order (remember the several systems, the future plans is to buy content from different actors on the market and index it up). A HitCollector should help. Matching documents are passed to a HitCollector in the order they were added to the index. So if newer documents were added to your index later, then the newest N documents are simply the last N documents passed to the HitCollector. Could that work? Cheers, Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Checking for duplicates inside index
you have two choices that I can think of: 1- before adding a document, check if it does't exist in the index. you can do this by querying on a unique field if you have it . 2- you can index all your documents, and once the indexing is done you can dedupe. (Lucene has built in methods that can help with this) if your index doesn't have a unique key, you need to add one like the one you suggested. -Original Message- From: karl wettin [mailto:[EMAIL PROTECTED] Sent: Monday, May 22, 2006 6:05 PM To: java-user@lucene.apache.org Subject: Re: Checking for duplicates inside index On Mon, 2006-05-22 at 23:42 +0200, Hannes Carl Meyer wrote: I'm indexing ~1 documents per day but since I'm getting a lot of real duplicates (100% the same document content) I want to check the content before indexing... My idea is to create a checksum of the documents content and store it within document inside the index, before indexing a new document I will compare the new documents checksum with the ones in the index. Is that a good idea? does someone have experiences with that method? any tools available? That could work. You will need a big sum though. MD5? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Checking for duplicates inside index
I have created a method that can delete duplicate docs. Basically, during indexing, a doc is associated with an id (a term field defined by you.) that is indexed. Then, call the method to delete duplicates whenever you update index. I haven't contributed back to Lucene community yet because our code is in beta testing now. My former colleague, Chris, has received agreement from Doug Cutting since last August that this feature is nice to have. Eugene -Original Message- From: Omar Didi [mailto:[EMAIL PROTECTED] Sent: Monday, May 22, 2006 6:47 PM To: java-user@lucene.apache.org Subject: RE: Checking for duplicates inside index you have two choices that I can think of: 1- before adding a document, check if it does't exist in the index. you can do this by querying on a unique field if you have it . 2- you can index all your documents, and once the indexing is done you can dedupe. (Lucene has built in methods that can help with this) if your index doesn't have a unique key, you need to add one like the one you suggested. -Original Message- From: karl wettin [mailto:[EMAIL PROTECTED] Sent: Monday, May 22, 2006 6:05 PM To: java-user@lucene.apache.org Subject: Re: Checking for duplicates inside index On Mon, 2006-05-22 at 23:42 +0200, Hannes Carl Meyer wrote: I'm indexing ~1 documents per day but since I'm getting a lot of real duplicates (100% the same document content) I want to check the content before indexing... My idea is to create a checksum of the documents content and store it within document inside the index, before indexing a new document I will compare the new documents checksum with the ones in the index. Is that a good idea? does someone have experiences with that method? any tools available? That could work. You will need a big sum though. MD5? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: does anybody have the experience to do some pooling upon lucene?
OK, got it. Thanks. On 5/23/06, Erik Hatcher [EMAIL PROTECTED] wrote: On May 21, 2006, at 10:56 PM, Zhenjian YU wrote: I didn't dig the source code of lucence deep enough, but I noticed that the IndexSearcher uses an IndexReader, while the cost of initializing IndexReader is a bit high. The key is the IndexReader. My application is a webapp, so I think it may be good if I cache some instances of IndexSearcher to provide service for my webapp. I haven't done any performance testing yet. Maybe I test it later to see the difference between caching and without caching. It is best to keep only a single IndexSearcher/IndexReader combination around. There is no need to have more than one instance, and in fact it is a waste of resources to do so. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Checking for duplicates inside index
On Mon, 2006-05-22 at 23:42 +0200, Hannes Carl Meyer wrote: I'm indexing ~1 documents per day but since I'm getting a lot of real duplicates (100% the same document content) I want to check the content before indexing... My idea is to create a checksum of the documents content and store it within document inside the index, before indexing a new document I will compare the new documents checksum with the ones in the index. Is that a good idea? does someone have experiences with that method? any tools available? That could work. You will need a big sum though. MD5? Just as a reference, Nutch uses an MD5 digest to detect duplicate web pages. It works fine, except of course when two docs differ by only an insignificant text delta. There's some recent work in this area - check out TextProfileSignature. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: SpanScorer Out Of Bounds
Hi Otis, Thanks for that. I found out that it's a memory usage problem rather than one on Lucene's part. Thanks. Michael On 5/22/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi Michael, I don't see any responses to your problem. It's early, so you may get some, but this sounds like a case for JIRA. Also, please try to write and attach (to your JIRA case) a unit test that demonstrates a problem, something we can run and debug this. Without that we may not be able to fix this. Otis - Original Message From: Michael Chan [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Sunday, May 21, 2006 7:37:35 AM Subject: SpanScorer Out Of Bounds Hi, Somehow, after running many searches using instances of SpanQuery (mostly SpanNearQuery), I get the ArrayIndexOutOfBounds exception: bash-2.03$ java.lang.ArrayIndexOutOfBoundsException: 2147483647 at org.apache.lucene.search.spans.SpanScorer.score(SpanScorer.java:72) at org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:82) at org.apache.lucene.search.BooleanScorer2$2.score(BooleanScorer2.java:186) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:327) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:291) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:99) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.init(Hits.java:44) at org.apache.lucene.search.Searcher.search(Searcher.java:44) at org.apache.lucene.search.Searcher.search(Searcher.java:36) ... traces to my program Is there a counter or something in play? Is it a cache or some sort? Any help will be much appreciated. Thanks. Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Making SpanQuery more effiicent
Hi, As I use SpanQuery purely for the use of slop, I was wondering how to make SpanQuery more efficient,. Since I don't need any span information, is there a way to disable the computation for span and other unneeded overhead? Thanks. Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]