Re: Difference between SortedDocValues and SortedSetDocValues

2017-10-12 Thread Yonik Seeley
On Thu, Oct 12, 2017 at 8:53 AM, Chellasamy G wrote: > Could anyone please explain the difference between SortedDocValues and > SortedSetDocValues. SortedDocValues has at most 1 value per document (single-valued). SortedSetDocValues supports a set of values per

Re: no concurrent merging?

2016-08-09 Thread Yonik Seeley
On Thu, Aug 4, 2016 at 9:35 AM, Michael McCandless wrote: > Lucene's merging is concurrent, but Solr unfortunately uses > UninvertingReader on each DBQ ... I'm not sure why. It looks like DeleteByQueryWrapper was added by

Re: Port of Custom value source from v4.10.3 to v6.1.0

2016-07-08 Thread Yonik Seeley
Use getSortedDocValues for a single-valued field, or getSortedSetDocValues for multi-valued. -Yonik On Fri, Jul 8, 2016 at 12:29 PM, paule_lecuyer wrote: > Many Thanks Yonik, I will try that. > > For my understanding, what is the difference between SortedSetDocValues >

Re: Port of Custom value source from v4.10.3 to v6.1.0

2016-07-08 Thread Yonik Seeley
Use the docValues interface by calling getSortedSetDocValues on the leaf reader. That will either 1) use real docValues if you have indexed them 2) use the FieldCache to uninvert an indexed field and make it look like docValues. -Yonik On Thu, Jul 7, 2016 at 1:33 PM, paule_lecuyer

Re: Lucene 5: Mutable/Immutable interface of BitSet

2015-09-13 Thread Yonik Seeley
On Sun, Sep 13, 2015 at 4:23 PM, Selva Kumar wrote: > Mutable, "Immutable" interface of BitSet seems to be defined based on > specific things like live docs and documents with DocValue etc. Any plan to > add general purpose readonly interface to BitSet? We already

Re: Lucene 5: Mutable/Immutable interface of BitSet

2015-09-13 Thread Yonik Seeley
t; Similarly, BitSet > has many more write methods compared to MutableBits. So, as I said, this > seems to be based on internal requirement like live docs, documents with > DocValues etc. > > Thanks for your time, Yonik > > > On Sun, Sep 13, 2015 at 4:43 PM, Yonik Seeley &l

Re: Lucene nrt

2015-07-20 Thread Yonik Seeley
Yes, if you do a commit with waitSearcher=true (and it succeeds) then any adds before that point will be visible. -Yonik On Mon, Jul 20, 2015 at 8:25 PM, Bhawna Asnani bhawna.asn...@gmail.com wrote: Hi, I am using solr to update a document and read it back immediately through search. I do

Lucene/Solr Revolution 2015 Voting

2015-06-11 Thread Yonik Seeley
Hey Folks, If you're interested in going to Lucene/Solr Revolution this year in Austin, please vote for the sessions you would like to see! https://lucenerevolution.uservoice.com/ -Yonik - To unsubscribe, e-mail:

Re: Query with many clauses

2014-10-29 Thread Yonik Seeley
For queries with many terms, where each term matches few documents (actually a single document for ID filters in my tests), I saw speedups between 4x and 8x http://heliosearch.org/solr-terms-query/ (the 3rd chart) -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets,

Re: Square of Idf

2014-03-07 Thread Yonik Seeley
On Thu, Mar 6, 2014 at 6:28 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; Tf-Idf is explanation says that: *idf(t)* appears for *t* in both the query and the document, hence it is squared in the equation. DefaultSimilarity does not square it. What it the explanation of it? I think

Re: Natural Sort Order

2013-10-14 Thread Yonik Seeley
On Mon, Oct 14, 2013 at 9:43 PM, Darren Hoffman dar...@jnamics.com wrote: Can anyone tell me if a search based on a ConstantScoreQuery should return the results in the order that the documents were added to the index? The order will be internal docid, which used to be the order that docs were

Re: sorting with lucene 4.3

2013-07-31 Thread Yonik Seeley
On Wed, Jul 31, 2013 at 2:51 PM, Nicolas Guyot sfni...@gmail.com wrote: I have written a quick test to reproduce the slower sorting with numeric DV. In this test case, it happens only when reverse sorting. Right - I bet your numeric field is relatively ordered in the index. When this happens,

Re: About query result cache.

2012-12-16 Thread Yonik Seeley
On Mon, Dec 17, 2012 at 12:58 AM, lukai lukai1...@gmail.com wrote: Hi, guys: Does queryplugin implementation impacts caching? I have implemented a new query parser which just take the input query string and return my own query object. But the problem is, when i apply this logic to solr, it

Re: Lucene 4.0: Custom Query Parser newTermQuery(Term term) override

2012-07-11 Thread Yonik Seeley
On Wed, Jul 11, 2012 at 9:34 AM, Jamie ja...@stimulussoft.com wrote: I am busying attempting to integrate Lucene 4.0 Alpha into my code base. I have a custom QueryParser that extends QueryParser and overrides newRangeQuery and newTermQuery Random pointer: for most special case field handling,

Re: IndexReader.deleteDocument in Lucene 3.6

2012-05-25 Thread Yonik Seeley
On Fri, May 25, 2012 at 5:23 AM, Nikolay Zamosenchuk nikolaz...@gmail.com wrote: IndexWriter.deleteDocument(..) is not final, but doesn't return any result. Deleted terms are buffered for good performance, so at the time of IndexWriter.deleteDocument(Term) we don't know how many documents match

Re: org.apache.lucene.index.MultiFields.getLiveDocs(IndexReader) returning null.

2012-03-05 Thread Yonik Seeley
On Mon, Mar 5, 2012 at 1:53 PM, Benson Margulies bimargul...@gmail.com wrote: There's no javadoc on here yet, and I am a little puzzled by the fact that it is returning null for me. Does that imply that there can't be any deleted docs known to the reader? Right, see AtomicReader /** Returns

Re: Spatial Search

2011-12-31 Thread Yonik Seeley
On Sat, Dec 31, 2011 at 11:52 AM, Lance Java lance.j...@googlemail.com wrote: Hi, I am new to Lucene and I am trying to use spatial search. The old tier-based stuff in Lucene is broken and considered deprecated. For Lucene, this may currently be your best hope:

Re: ElasticSearch

2011-11-17 Thread Yonik Seeley
On Thu, Nov 17, 2011 at 2:53 PM, Simon Willnauer simon.willna...@googlemail.com wrote: dude, look at this query... its insane isn't it :) Sorry... what's the equivalent you'd like instead? Or if you're just unjustifiably bitching about Solr again, maybe I should take a stroll through Lucene land

Re: ElasticSearch

2011-11-17 Thread Yonik Seeley
On Thu, Nov 17, 2011 at 3:18 PM, Uwe Schindler u...@thetaphi.de wrote: Sorry, this query is really ununderstandable. Those complex queries should have a meaningful language, e.g. a JSON object structure There are upsides and downsides to that. A big JSON object graph would be easier to *read*

Re: ElasticSearch

2011-11-17 Thread Yonik Seeley
On Thu, Nov 17, 2011 at 3:40 PM, Mark Harwood markharw...@yahoo.co.uk wrote: JSON or XML can reflect more closely the hierarchy in the underlying Lucene query objects. We normally use the Lucene QueryParser syntax itself for that (not HTTP parameters). Other parameters such as filters,

Re: ElasticSearch

2011-11-17 Thread Yonik Seeley
On Thu, Nov 17, 2011 at 3:44 PM, Michael McCandless luc...@mikemccandless.com wrote: Maybe someone can post the equivalent query in ElasticSearch? I don't think it's possible. Hoss threw in the kitchen sink into his contrived' example. Here's a super simple example: JSON: { sort : [

Re: ElasticSearch

2011-11-16 Thread Yonik Seeley
On Wed, Nov 16, 2011 at 10:36 AM, Shashi Kant sk...@sloan.mit.edu wrote: I had posted this earlier on this list, hope this provides some answers http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/ Except it's an out of date comparison. We have NRT (near real time

Re: Please help me with a basic question...

2011-05-20 Thread Yonik Seeley
On Fri, May 20, 2011 at 2:46 PM, Doron Cohen cdor...@gmail.com wrote: I stumbled upon the 'Explain' function yesterday though it returns a crowded message using debug in SOLR admin. Is there another method or interface which returns more or cleaner info? I am not familiar with the use of

Re: Retrieving the first document in a range

2011-04-05 Thread Yonik Seeley
On Tue, Apr 5, 2011 at 10:06 AM, Shai Erera ser...@gmail.com wrote: Can we use TermEnum to skip to the first term 'after 3 weeks'? If so, we can pull the first doc that appears in the TermDocs of that Term (if it's a valid term). Yep. Try this to get the term you want to use to seek:

Re: DocIdSet to represent small numberr of hits in large Document set

2011-04-05 Thread Yonik Seeley
On Tue, Apr 5, 2011 at 2:24 AM, Antony Bowesman a...@thorntothehorn.org wrote: Seems like SortedVIntList can be used to store the info, but it has no methods to build the list in the first place, requiring an array or bitset in the constructor. It has a constructor that takes DocIdSetIterator

Re: Undo hyphenation when indexing

2011-04-01 Thread Yonik Seeley
Solr has a hyphenated word filter you could copy. http://lucene.apache.org/solr/api/org/apache/solr/analysis/HyphenatedWordsFilterFactory.html On trunk, this has been folded into the analysis module. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco

Re: which unicode version is supported with lucene

2011-02-27 Thread Yonik Seeley
On Sun, Feb 27, 2011 at 2:15 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Jepp, its back online. Just did a short test and reported my results to jira, but is the error from the xml output still a jetty problem or is it from XMLwriter? The patch has been committed, so you should

Re: which unicode version is supported with lucene

2011-02-25 Thread Yonik Seeley
On Fri, Feb 25, 2011 at 8:48 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: So Solr trunk should already handle Unicode above BMP for field type string? Strange... One issue is that jetty doesn't support UTF-8 beyond the BMP: /opt/code/lusolr/solr/example/exampledocs$ ./test_utf8.sh

Re: which unicode version is supported with lucene

2011-02-25 Thread Yonik Seeley
know how to add a char above the BMP to utf8-example.xml? -Yonik http://lucidimagination.com Regards, Bernd Am 25.02.2011 14:54, schrieb Yonik Seeley: On Fri, Feb 25, 2011 at 8:48 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: So Solr trunk should already handle Unicode above BMP

Re: Storing an ID alongside a document

2011-02-02 Thread Yonik Seeley
That's exactly what the CSF feature is for, right? (docvalues branch) -Yonik http://lucidimagination.com On Wed, Feb 2, 2011 at 1:03 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I'm curious if there's a new way (using flex or term states) to store IDs alongside a document and

Re: Storing an ID alongside a document

2011-02-02 Thread Yonik Seeley
On Wed, Feb 2, 2011 at 9:23 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Is it? I thought it would load the values into heap RAM like the field cache and in addition save the values to disk? Does it also read the values directly from disk? Loading into memory is a separate

Re: WARNING: re-index all trunk indices!

2010-12-17 Thread Yonik Seeley
On Fri, Dec 17, 2010 at 11:18 AM, Michael McCandless luc...@mikemccandless.com wrote: If you are using Lucene's trunk (nightly build) release, read on... I just committed a change (for LUCENE-2811) that changes the index format on trunk, thus breaking (w/ likely strange exceptions on reading

Re: The logic of QueryParser

2010-12-13 Thread Yonik Seeley
On Mon, Dec 13, 2010 at 2:51 PM, Robert Muir rcm...@gmail.com wrote: On Mon, Dec 13, 2010 at 2:43 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Dec 13, 2010 at 2:10 PM, Brian Hurt bhur...@gmail.com wrote:  I was just wondering what the logic was for defaulting to or instead

Re: The logic of QueryParser

2010-12-13 Thread Yonik Seeley
On Mon, Dec 13, 2010 at 3:07 PM, Robert Muir rcm...@gmail.com wrote: On Mon, Dec 13, 2010 at 3:04 PM, Yonik Seeley yo...@lucidimagination.com wrote: I think of the Lucene QueryParser like SQL. SQL is text based and also meant for human entered text - but for either very expert users

Webcast: Better Search Results Faster with Apache Solr and LucidWorks Enterprise

2010-12-08 Thread Yonik Seeley
We're holding a free webinar about relevancy enhancements in our commercial version of Solr. Details below. -Yonik http://www.lucidimagination.com - Join us for a free technical webcast Better Search Results Faster with

Re: best practice: 1.4 billions documents

2010-11-26 Thread Yonik Seeley
On Mon, Nov 22, 2010 at 12:49 PM, Uwe Schindler u...@thetaphi.de wrote: (Fuzzy scores on MultiSearcher and Solr are totally wrong because each shard uses another rewritten query). Hmmm, really? I thought that fuzzy scoring should just rely on edit distance? Oh wait, I think I see - it's

Re: best practice: 1.4 billions documents

2010-11-22 Thread Yonik Seeley
On Mon, Nov 22, 2010 at 12:17 PM, Uwe Schindler u...@thetaphi.de wrote: The latest discussion was more about MultiReader vs. MultiSearcher. But you are right, 1.4 B documents is not easy to go, especially when you index grows and you get to the 2.1 B marker, then no MultiSearcher or whatever

Re: best practice: 1.4 billions documents

2010-11-21 Thread Yonik Seeley
On Sun, Nov 21, 2010 at 6:33 PM, Luca Rondanini luca.rondan...@gmail.com wrote: Hi everybody, I really need some good advice! I need to index in lucene something like 1.4 billions documents. I had experience in lucene but I've never worked with such a big number of documents. Also this is

Re: IndexWriter.close() performance issue

2010-11-20 Thread Yonik Seeley
On Fri, Nov 19, 2010 at 5:41 PM, Mark Kristensson mark.kristens...@smartsheet.com wrote: Here's the changes I made to org.apache.lucene.util.StringHelper:  //public static StringInterner interner = new SimpleStringInterner(1024,8); As Mike said, the real fix for trunk is to get rid of

FAST ESP - Solr migration webinar

2010-11-11 Thread Yonik Seeley
We're holding a free webinar on migration from FAST to Solr. Details below. -Yonik http://www.lucidimagination.com = Solr To The Rescue: Successful Migration From FAST ESP to Open Source Search Based on Apache Solr

Re: IndexWriter.close() performance issue

2010-11-03 Thread Yonik Seeley
It turns out that the prepareCommit() is the slow call here, taking several seconds to complete. I've done some reading about it, but have not found anything that might be helpful here. The fact that it is slow every single time, even when I'm adding exactly one document to the index, is

Re: lucene norms cached twice

2010-10-29 Thread Yonik Seeley
On Fri, Oct 29, 2010 at 3:32 PM, Cabansag, Ronald-Alvin R ronald-alvin.caban...@cengage.com wrote: We use a QueryWrapperFilter.getDocIdSet(indexReader) to get the DocIdSet and compute the hit count using its iterator. If you want to avoid double-caching of norms, then you should call

Re: Function Query, Required Clauses, and Matching

2010-10-25 Thread Yonik Seeley
On Mon, Oct 25, 2010 at 7:00 PM, Dennis Kubes ku...@apache.org wrote: A curiosity.  Some of the documentation for function queries says they match every document in the index.  When running a query that has boolean required clauses and an optional ValueSourceQuery or function query is the

Re: Checksum and transactional safety for lucene indexes

2010-09-24 Thread Yonik Seeley
On Tue, Sep 21, 2010 at 12:53 AM, Lance Norskog goks...@gmail.com wrote: If an index file is not completely written to disk, it never become available. Lucene has a file describing the current active index segments. It writes all new files to the disk, and changes the description file

Re: Filters do not work with MultiSearcher?

2010-09-10 Thread Yonik Seeley
This is working as designed. Note this method: public DocIdSet getDocIdSet(IndexReader indexReader) throws IOException { return openBitSet; } You must pay attention to the IndexReader passed - and the DocIdSet returned must always be based on that reader (and the first document of

Re: API to retrieve search results without scoring or sorting

2010-07-19 Thread Yonik Seeley
On Mon, Jul 19, 2010 at 6:14 AM, Naveen Kumar id.n...@gmail.com wrote: Is there any API using which I can retrieve search results, such that they are neither scored nor sorted (for performance reasons). I just need the results, don't need any extra computation on that. Use your own custom

Re: Get lengthNorm of a field

2010-07-19 Thread Yonik Seeley
On Mon, Jul 19, 2010 at 9:53 AM, Philippe mailer.tho...@gmail.com wrote: is there a possibility to retrieve the lengthNorm for all (or a specific) fields in a specific document? See IndexReader: public abstract byte[] norms(String field) throws IOException; And Similarity: public float

Re: Could multiple indexers change same collections at the same time?

2010-06-24 Thread Yonik Seeley
Yes, all of that still applies to Lucene 3x and 4x, and is unlikely to change any time soon. -Yonik http://www.lucidimagination.com On Thu, Jun 24, 2010 at 1:51 PM, Zhang, Lisheng lisheng.zh...@broadvision.com wrote: Hi, I remembered I tested earlier lucene 1.4 and 2.4, and found the

Re: segment_N file is missed

2010-06-16 Thread Yonik Seeley
On Tue, Jun 15, 2010 at 5:23 AM, Michael McCandless luc...@mikemccandless.com wrote: CheckIndex is not able to recover from this corruption (missing segments_N file); this would be a nice addition... But it sounds like you've worked out a way to write your own segmetns_N? Use

Re: Docs with any score are collected in the Collector implementations

2010-06-02 Thread Yonik Seeley
On Wed, Jun 2, 2010 at 1:10 PM, jan.kure...@nokia.com wrote: that's probably because I move from lucene to solr. We will need to filter them from the result manually then first. Solr has a function range query that can filter out any values outside of the given range.

Re: Using JSON for index input and search output

2010-05-30 Thread Yonik Seeley
On Sun, May 30, 2010 at 1:33 PM, Visual Logic visual.lo...@gmail.com wrote: JSON is the format used for all the configuration and property files in the RIA application we are developing. Is Lucene able to create a document from a given JSON file and index it? Is Lucene able to provide a JSON

Re: Using JSON for index input and search output

2010-05-30 Thread Yonik Seeley
On Sun, May 30, 2010 at 2:27 PM, Visual Logic visual.lo...@gmail.com wrote: Solr is embeddable but does that not just mean that SolrJ only provides the ability to call Solr running on some server? Nope - embeddable as in running in the same JVM as your application. For some of my use cases

Re: How to get the number of unique terms in the inverted index

2010-05-28 Thread Yonik Seeley
It seems like there should be a formula for estimating the total number of unique terms given that you know the unique term counts for each segment, and make certain assumptions like random document distribution across segments. -Yonik http://www.lucidimagination.com On Thu, May 27, 2010 at 9:17

Re: How to get the number of unique terms in the inverted index

2010-05-27 Thread Yonik Seeley
On Thu, May 27, 2010 at 2:32 PM, kannan chandrasekaran ckanna...@yahoo.com wrote: I was wondering  if there is a way to retrieve the number of unique terms in the lucene ( version 2.4.0) ... I am aware of the terms() terms(Term) method that returns an enumeration (TermEnum) but that involves

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Yonik Seeley
On Mon, May 17, 2010 at 5:00 PM, Shay Banon kim...@gmail.com wrote:   I wanted to verify if my understanding is correct. Assuming that I use NRT, and refresh, say, every 1 second, caching based on IndexReader, such is what is used in the CachingWrapperFilter is basically useless No, it's fine.

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Yonik Seeley
.getSequentialSubReaders() != null) {                    System.err.println(Should not be more readers...);                }            }        }    }    indexWriter.close(); } On Tue, May 18, 2010 at 12:30 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, May 17, 2010 at 5:00 PM, Shay Banon

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Yonik Seeley
On Mon, May 17, 2010 at 9:00 PM, Shay Banon kim...@gmail.com wrote: Great, so I am not imagining things this late into the night ... ;), not so great, since using NRT with field cache (like sorting) or caching filters, or anything that caches based on IndexReader not really an option. This

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Yonik Seeley
On Mon, May 17, 2010 at 9:12 PM, Shay Banon kim...@gmail.com wrote: Just saw that you opened a case for that. I think that its important in your test case to also test for object identity, not just equals. This is because the IndexReader (or the FieldCacheKey) are used as keys in weak hash

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Yonik Seeley
looking now at what it does, its new... -shay.banon On Tue, May 18, 2010 at 4:04 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, May 17, 2010 at 9:00 PM, Shay Banon kim...@gmail.com wrote: Great, so I am not imagining things this late into the night ... ;), not so great

Re: FieldCache and 2.9

2010-05-11 Thread Yonik Seeley
You are requesting the FieldCache entry from the top-level reader and hence a whole new FieldCache entry must be created. Lucene 2.9 sorting requests FieldCache entries at the segment level and hence reuses entries for those segments that haven't changed. -Yonik Apache Lucene Eurocon 2010 18-21

Re: MatchAllDocsQuery and MatchNoDocsQuery

2010-05-10 Thread Yonik Seeley
Yes on all counts. Lucene doesn't modify query objects, so they are save for reuse among multiple threads. -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague 2010/5/10 Mindaugas Žakšauskas min...@gmail.com: Hi, Can anybody confirm whether MatchAllDocsQuery can be used as an

Re: problem in Lucene's ranking function

2010-05-05 Thread Yonik Seeley
2010/5/5 José Ramón Pérez Agüera jose.agu...@gmail.com: [...] The consequence is that a document matching a single query term over several fields could score much higher than a document matching several query terms in one field only, One partial workaround that people use is

Fwd: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 21, 2010

2010-03-24 Thread Yonik Seeley
Forwarding to lucene only - the big cross-post caused my gmail filters to file it. -Yonik -- Forwarded message -- From: Grant Ingersoll gsing...@apache.org Date: Wed, Mar 24, 2010 at 8:03 PM Subject: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 21,

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Yonik Seeley
On Thu, Mar 11, 2010 at 4:10 PM, Peter Keegan peterlkee...@gmail.com wrote: I want the TFC to do all the cool things it does like custom sorting, saving the field values, max score, etc. I suppose the custom Collector could explicitly delegate all TFC's methods, but this doesn't seem right. No

Re: NumericField exact match

2010-02-27 Thread Yonik Seeley
On Fri, Feb 26, 2010 at 3:33 PM, Ivan Vasilev ivasi...@sirma.bg wrote: Does it matter precision step when I use NumericRangeQuery for exact matches? No. There is a full-precision version of the value indexed regardless of the precision step, and that's used for an exact match query. I mean

Re: Sort and Collector

2010-02-03 Thread Yonik Seeley
On Wed, Feb 3, 2010 at 1:40 PM, tsuraan tsur...@gmail.com wrote: Is there any way to run a search where I provide a Query, a Sort, and a Collector?  I have a case where it is sometimes, but rarely, necessary to get all the results from a query, but usually I'm satisfied with a smaller amount.  

Re: NumericRangeQuery performance with 1/2 billion documents in the index

2010-01-03 Thread Yonik Seeley
Perhaps this is just a huge index, and not enough of it can be cached in RAM. Adding additional clauses to a boolean query incrementally destroys locality. 104GB of index and 4GB of RAM means you're going to be hitting the disk constantly. You need more hardware - if you're requirements are low

Re: NumericRangeQuery performance with 1/2 billion documents in the index

2010-01-03 Thread Yonik Seeley
On Sun, Jan 3, 2010 at 10:42 AM, Karl Wettin karl.wet...@gmail.com wrote: 3 jan 2010 kl. 16.32 skrev Yonik Seeley: Perhaps this is just a huge index, and not enough of it can be cached in RAM. Adding additional clauses to a boolean query incrementally destroys locality. 104GB of index

Re: Finding the highest term in a field

2009-11-19 Thread Yonik Seeley
On Thu, Nov 19, 2009 at 1:04 AM, Daniel Noll dan...@nuix.com wrote: I take it the existing numeric fields can't already do stuff like this? Nope, it's a fundamental limitation of the current TermEnums. -Yonik http://www.lucidimagination.com

Re: Finding the highest term in a field

2009-11-18 Thread Yonik Seeley
On Wed, Nov 18, 2009 at 10:48 PM, Daniel Noll dan...@nuix.com wrote: But what if I want to find the highest?  TermEnum can't step backwards. I've also wanted to do the same. It's coming with the new flexible indexing patch:

Re: Sort fields shouldn't be tokenized

2009-11-16 Thread Yonik Seeley
On Mon, Nov 16, 2009 at 11:38 AM, Jeff Plater jpla...@healthmarketscience.com wrote: Thanks - so if my sort field is a single term then I should be ok with using an analyzer (to lowercase it for example). Correct - the key is that there is not more than one token per document for the field

Re: share some numbers for range queries

2009-11-15 Thread Yonik Seeley
On Mon, Nov 16, 2009 at 1:02 AM, John Wang john.w...@gmail.com wrote:   I did some performance analysis for different ways of doing numeric ranging with lucene. Thought I'd share: FYI, the second approach is already implemented in both Lucene and Solr.

Re: Equality Numeric Query

2009-11-11 Thread Yonik Seeley
On Wed, Nov 11, 2009 at 8:54 AM, Shai Erera ser...@gmail.com wrote: I index documents with numeric fields using the new Numeric package. I execute two types of queries: range queries (for example, [1 TO 20}) and equality queries (for example 24.75). Don't mind the syntax. Currently, to

Re: Lucene index write performance optimization

2009-11-10 Thread Yonik Seeley
On Tue, Nov 10, 2009 at 11:43 AM, Jamie Band ja...@stimulussoft.com wrote: As an aside note, is there any way for Lucene to support simultaneous writes to an index? The indexing process is highly parallelized... just use multiple threads to add documents to the same IndexWriter. -Yonik

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-27 Thread Yonik Seeley
On Tue, Oct 27, 2009 at 9:07 PM, Luis Alves lafa...@gmail.com wrote: But there needs to be some forced push for these shorter major release cycles, to allow for code clean cycles to also be sorter. Maybe... or maybe not. There's also value in a more stable API over a longer period of time.

Re: help needed improving lucene concurret search performance

2009-10-23 Thread Yonik Seeley
How many processors do you have on this system? If you are CPU bound, 100 threads is going to be 10 times slower (at a minimum) than 10 threads (unless you have more than 10 CPUs). -Yonik http://www.lucidimagination.com On Fri, Oct 23, 2009 at 2:18 AM, Wilson Wu songzi0...@gmail.com wrote: Dear

Re: Clarification on TokenStream.close() needed

2009-10-20 Thread Yonik Seeley
2009/10/20 Teruhiko Kurosaka k...@basistech.com: My Tokenizer started showing an error when I switched to Solr 1.4 dev version.  I am not too confident but it seems that Solr 1.4 calls close() on my Tokenizer before calling reset(Reader) in order to reuse the Tokenizer.  That is, close() is

Re: Hits and TopDoc

2009-10-20 Thread Yonik Seeley
On Tue, Oct 20, 2009 at 5:03 PM, Nathan Howard natehowa...@gmail.com wrote: This is sort of related to the above question, but I'm trying to update some (now depricated) Java/Lucene code that I've become aware of once we started using 2.4.1 (we were previously using 2.3.2): Hits results =

Re: Hits and TopDoc

2009-10-20 Thread Yonik Seeley
Hmm, yes, I should have thought of quoting the havadoc :-) The Hits javadoc has been udpated though... we shouldn't be pushing people toward collectors unless they really need them: * TopDocs topDocs = searcher.search(query, numHits); * ScoreDoc[] hits = topDocs.scoreDocs; * for (int i =

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Yonik Seeley
On Fri, Oct 16, 2009 at 4:54 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, On Fri, Oct 16, 2009 at 10:23 AM, Danil ŢORIN torin...@gmail.com wrote: What about creating major version more often? +1 We're not going to run out of version numbers, so I don't see a reason not to upgrade

Re: NPE in NearSpansUnordered

2009-10-15 Thread Yonik Seeley
Are you using any custom query types? Anything to help us reproduce (like the acutal query this happened on) would be greatly appreciated. -Yonik http://www.lucidimagination.com On Thu, Oct 15, 2009 at 1:17 PM, Peter Keegan peterlkee...@gmail.com wrote: I'm using Lucene 2.9 and sometimes get

Re: Realtime search best practices

2009-10-12 Thread Yonik Seeley
Guys, please - you're not new at this... this is what JavaDoc is for: /** * Returns a readonly reader containing all * current updates. Flush is called automatically. This * provides near real-time searching, in that changes * made during an IndexWriter session can be made *

Re: Realtime search best practices

2009-10-12 Thread Yonik Seeley
On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix jake.man...@gmail.com wrote:  It may be surprising, but in fact I have read that javadoc. It was not your email I responded to.  It talks about not needing to close the writer, but doesn't specifically talk about the what the relationship between

Re: Realtime search best practices

2009-10-12 Thread Yonik Seeley
Good point on isCurrent - I think it should only be with respect to the latest index commit point? and we should clarify that in the javadoc. [...] // but what does the nrtReader say? // it does not have access to the most recent commit // state, as there's been a commit (with documents) //

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Yonik Seeley
On Wed, Sep 16, 2009 at 12:33 PM, Uwe Schindler u...@thetaphi.de wrote: How should we proceed? Stop the final artifact build and voting or proceed with the release of 2.9? We waited so long and for most people it is faster than slower! I think we know that 2.9 will not be faster for everyone:

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Yonik Seeley
It's been a while since I wrote that benchmarker... is it OK that the answer is different? Did you use the same test file? -Yonik http://www.lucidimagination.com On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller markrmil...@gmail.com wrote: The results: config: impl=SeparateFile serial=false

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Yonik Seeley
we need to revert FSDir.open to return SimpleFSDir again, on non-Windows hosts.  But then we don't have good concurrency... Mike On Tue, Sep 15, 2009 at 2:59 PM, Yonik Seeley yonik.see...@lucidimagination.com wrote: It's been a while since I wrote that benchmarker... is it OK

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Yonik Seeley
Here's my results in my quad core phenom, with ondemand CPU freq scaling disabled (clocks locked at 3GHz) Ubuntu 9.04, filesystem=ext4 on 7200RPM IDE drive, testfile=95MB fully cached. Linux odin 2.6.28-15-generic #49-Ubuntu SMP Tue Aug 18 19:25:34 UTC 2009 x86_64 GNU/Linux Java(TM) SE Runtime

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Yonik Seeley
On Tue, Sep 15, 2009 at 4:12 PM, Yonik Seeley yo...@lucidimagination.com wrote: Note that when nthreads1 I sometimes get wrong answers for SimpleFile... s/SimpleFile/SingleFile/g - To unsubscribe, e-mail: java-user-unsubscr

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Yonik Seeley
- everyone knows a jackalope is faster than a koala. - Mark Yonik Seeley wrote: Here's my results in my quad core phenom, with ondemand CPU freq scaling disabled (clocks locked at 3GHz) Ubuntu 9.04, filesystem=ext4 on 7200RPM IDE drive, testfile=95MB fully cached. Linux odin 2.6.28-15

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Yonik Seeley
OK, I see the issue - SingleFile doesn't have it's own filepointer. I'll update the original issue. (for large files, this shouldn't change the times any). -Yonik http://www.lucidimagination.com On Tue, Sep 15, 2009 at 4:13 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Sep 15

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Yonik Seeley
On Wed, Sep 9, 2009 at 8:57 AM, Peter Keeganpeterlkee...@gmail.com wrote: Using JProfiler, I observe that the improvement is due to a huge reduction in the number of calls to TermDocs.next and TermDocs.skipTo (about 65% fewer calls). Indexes are searched per-segment now (i.e. MultiTermDocs

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Yonik Seeley
On Wed, Sep 9, 2009 at 9:17 AM, Yonik Seeleyyonik.see...@lucidimagination.com wrote: On Wed, Sep 9, 2009 at 8:57 AM, Peter Keeganpeterlkee...@gmail.com wrote: Using JProfiler, I observe that the improvement is due to a huge reduction in the number of calls to TermDocs.next and TermDocs.skipTo

Re: Extending Sort/FieldCache

2009-09-08 Thread Yonik Seeley
On Sun, Sep 6, 2009 at 4:42 AM, Shai Ereraser...@gmail.com wrote: I've resisted using payloads for this purpose in Solr because it felt like an interim hack until CSF is implemented. I don't see it as a hack, but as a proper use of a great feature in Lucene. It's proper use for an application

Re: Extending Sort/FieldCache

2009-09-05 Thread Yonik Seeley
On Fri, Sep 4, 2009 at 12:33 AM, Shai Ereraser...@gmail.com wrote: 2) Contribute my payload-based sorting package. Currently it only reads from disk during searches, and I'd like to enhance it to use in-memory cache as well. It's a moderate-size package, so this one will need to wait until (1)

Re: Is there a way to check for field uniqueness when indexing?

2009-08-26 Thread Yonik Seeley
stuff from the index using a query as well as adding? Does Solr also remember the deletions as well? It used to - but now it delegates all that to IndexWriter as well (and lucene buffers them instead). -Yonik http://www.lucidimagination.com Daniel Shane Yonik Seeley wrote: On Fri, Aug 21

Re: Is there a way to check for field uniqueness when indexing?

2009-08-20 Thread Yonik Seeley
On Fri, Aug 21, 2009 at 12:49 AM, Chris Hostetterhossman_luc...@fucit.org wrote: : But in that case, I assume Solr does a commit per document added. not at all ... it computes a signature and then uses that as a unique key. IndexWriter.updateDocument does all the hard work. Right - Solr used

trie* space-time tradeoff

2009-07-20 Thread Yonik Seeley
Anyone have any numbers? I couldn't find complete info in the Trie* JIRA issues, esp relating to size increase in the index. There was this: The indexes each contain 13 numeric, tree encoded fields (doubles and Dates). Index size (including the normal fields) was: * 8bit: 4.8 GiB *

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Yonik Seeley
Could this perhaps have anything to do with the changes to DocIdSetIterator? Glancing at the default implementation of advance makes me wince a bit: public int advance(int target) throws IOException { while (nextDoc() target) {} return doc; } IMO, this is a back-compatibility

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Yonik Seeley
On Wed, Jul 15, 2009 at 4:37 PM, Uwe Schindleru...@thetaphi.de wrote: And the fix only affects custom DocIdSetIterators. And custom Queries (via Scorer) since Scorer inherits from DISI. But as Mike says, it shouldn't be the issue behind in this thread. -Yonik http://www.lucidimagination.com

  1   2   3   4   5   6   >