get all tokens from index

2009-09-09 Thread m.harig
hello all , is there any way to get all tokens from my index ? please anyone suggest me -- View this message in context: http://www.nabble.com/get-all-tokens-from-index-tp25359411p25359411.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: get all tokens from index

2009-09-09 Thread AHMET ARSLAN
hello all, is there any way to get all tokens from my index ? please anyone suggest me The code below prints all terms of a field. String path = E:\\ThesaurusSolrHome\\data\\index; String field = contents; IndexReader indexReader = IndexReader.open(path);

Re: Filtering question/advice

2009-09-09 Thread Amin Mohammed-Coleman
Hi Thanks for your reponse. Here is the following testcase: public class UnderwriterReferenceTest { private Directory directory; private Analyzer analyzer; private IndexSearcher indexSearcher; private IndexWriter indexWriter; private Document layerDocumentA; @Before

Re: get all tokens from index

2009-09-09 Thread m.harig
Thanks Ahmet , i found the solution. thanks a lot Ahmet Arslan wrote: hello all, is there any way to get all tokens from my index ? please anyone suggest me The code below prints all terms of a field. String path = E:\\ThesaurusSolrHome\\data\\index; String field =

RE: New Stream closed exception with Java 6

2009-09-09 Thread Chris Bamford
Thanks for your input Mark and Chris. I will take all into account Chris - Original Message - From: Mark Miller markrmil...@gmail.com Sent: Tue, 8/9/2009 8:06pm To: java-user@lucene.apache.org Subject: Re: New Stream closed exception with Java 6 Chris Hostetter wrote: : I'm coming to

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
I've been testing 2.9 RC2 lately and comparing query performance to 2.3.2. I'm seeing a huge increase in throughput (2x-10x) on an index that was built with 2.3.2. The queries have a lot of BoostingTermQuerys and boolean clauses containing a custom scorer. Using JProfiler, I observe that the

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Yonik Seeley
On Wed, Sep 9, 2009 at 8:57 AM, Peter Keeganpeterlkee...@gmail.com wrote: Using JProfiler, I observe that the improvement is due to a huge reduction in the number of calls to TermDocs.next and TermDocs.skipTo (about 65% fewer calls). Indexes are searched per-segment now (i.e. MultiTermDocs

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Yonik Seeley
On Wed, Sep 9, 2009 at 9:17 AM, Yonik Seeleyyonik.see...@lucidimagination.com wrote: On Wed, Sep 9, 2009 at 8:57 AM, Peter Keeganpeterlkee...@gmail.com wrote: Using JProfiler, I observe that the improvement is due to a huge reduction in the number of calls to TermDocs.next and TermDocs.skipTo

Re: Newbie: Luke and fields

2009-09-09 Thread Erick Erickson
It's all in the analyzers. Depending upon which analyzer you use manythings happen to the input stream. Casing is one example, but that's just the simplest. Which is why it's so important to use the same analyzer when indexing and querying unless you have a *very* good reason not to. I'd really

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
IndexSearcher.search is calling my custom scorer's 'next' and 'doc' methods 64% fewer times. I see no 'advance' method in any of the hot spots'. I am getting the same number of hits from the custom scorer. Has the BooleanScorer2 logic changed? Peter On Wed, Sep 9, 2009 at 9:17 AM, Yonik Seeley

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Mark Miller
How about the new score inorder/out of order stuff? It was an option before, but I think now it uses whats best by default? And pairs with the collector? I didn't follow any of that closely though. - Mark Peter Keegan wrote: IndexSearcher.search is calling my custom scorer's 'next' and 'doc'

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Michael McCandless
Right, BooleanQuery will now try to use BooleanScorer (does out of order collection, which does not use skipTo/advance at all, I think) when possible, instead of BooleanScorer2. This only applies for boolean queries that have only SHOULD clauses, and up to 32 MUST_NOT clauses (if there's even 1

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
Is it possible that skipTo is very costly with your custom scorer? It's no more expensive than 'next'. The scorer's 'skipTo' and 'next' methods call termdocs.skipTo or termdocs.next to get the next 'candidate' doc. This just checks a BitVector to find the next non-deleted doc. But the scorer

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
http://svn.apache.org/viewvc?view=revrevision=630698 This may be it. The scorer is sparse and usually in a conjuction with a dense scorer. Does the index format matter? I haven't yet built it with 2.9. Peter On Wed, Sep 9, 2009 at 10:17 AM, Yonik Seeley yo...@lucidimagination.comwrote: On

How to calculate the DGaps value in *.del file?

2009-09-09 Thread 関 磊
Hello, I want to know how to calculate the DGaps value in *.del file? For example, if there are 8000 bits and only bits 10,12,32 are set, DGaps would be used: (VInt) 1 , (byte) 20 , (VInt) 3 , (Byte) 1 I do not understand why the DGraps is 1 and 3. Please tell

Lucene 2.9 RC3 now available for testing

2009-09-09 Thread Mark Miller
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello Lucene users, On behalf of the Lucene dev community (a growing community far larger than just the committers) I would like to announce the third release candidate for Lucene 2.9. Please download and check it out – take it for a spin and kick

support for PayloadTermQuery in MoreLikeThis

2009-09-09 Thread Bill Au
Has anyone done anything regarding the support of PayloadTermQuery in MoreLikeThis? I took a quick look at the code and it seems to be simply a matter of swapping TermQuery with PayloadTermQuery. I guess a generic solution would be to add a enable method to enable PayloadTermQuery, keeping

IndexReader.isCurrent for cached indexes

2009-09-09 Thread Nick Bailey
Looking for some help figuring out a problem with the IndexReader.isCurrent() method and cached indexes.   We have a number of lucene indexes that we attempt to keep in memory after an initial query is performed.  In order to prevent the indexes from becoming stale, we check for changes about

NumberFormatException when creating field cache

2009-09-09 Thread Antony Bowesman
I'm using Lucene 2.3.2 and have a date field used for sorting, which is MMDDHHMM. I get an exception when the FieldCache is being generated as follows: java.lang.NumberFormatException: For input string: 190400-412317

Re: NumberFormatException when creating field cache

2009-09-09 Thread Mark Miller
Antony Bowesman wrote: I'm using Lucene 2.3.2 and have a date field used for sorting, which is MMDDHHMM. I get an exception when the FieldCache is being generated as follows: java.lang.NumberFormatException: For input string: 190400-412317