Example configuring TieredMergePolicy in Solr

2011-09-16 Thread Burton-West, Tom
the Solr TieredMergePolicy to set the parameters: setMaxMergeAtOnce, setSegmentsPerTier, and setMaxMergedSegmentMB? Tom Burton-West

Example for Solr TieredMergePolicy configuration

2011-09-16 Thread Burton-West, Tom
the Solr TieredMergePolicy to set the parameters: setMaxMergeAtOnce, setSegmentsPerTier, and setMaxMergedSegmentMB? Tom Burton-West

Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-16 Thread Burton-West, Tom
!--400GB /20=20GB or 2MB-- double name=setMaxMergedSegmentMB2/double /mergePolicy and got this error message SEVERE: java.lang.RuntimeException: no setter corrresponding to 'setMaxMergedSegmentMB' in org.apache.lucene.index.TieredMergePolicy Tom Burton-West

Re: Solr indexing process: keep a persistent Mysql connection throu all the indexing process

2011-08-23 Thread Tom
10K documents. Why not just batch them? You could read in 10K from your database, load em into an array of SolrDocuments. and them post them all at once to the Solr server? Or do em in 1K increments if they are really big. -- View this message in context:

copyField for big indexes

2011-08-22 Thread Tom
Is it a good rule of thumb, that when dealing with large indexes copyField should not be used. It seems to duplicate the indexing of data. You don't need copyField to be able to search on multiple fields. Example, if I have two fields: title and post and I want to search on both, I could just

Re: copyField for big indexes

2011-08-22 Thread Tom
Thanks Erick -- View this message in context: http://lucene.472066.n3.nabble.com/copyField-for-big-indexes-tp3275712p3275816.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: copyField for big indexes

2011-08-22 Thread Tom
Bill, I was using it as a simple default search field. I realise now that's not a good reason to use copyField. As I see it now, it should be used if you want to search in a way that is different: use different analyzers, etc; not for just searching on multiple fields in a single query.

Error loading a custom request handler in Solr 4.0

2011-08-10 Thread Tom Mortimer
ideas? thanks, Tom

Re: Error loading a custom request handler in Solr 4.0

2011-08-10 Thread Tom Mortimer
; } } On 10 August 2011 16:43, simon mtnes...@gmail.com wrote: Th attachment isn't showing up (in gmail, at least). Can you inline the relevant bits of code ? On Wed, Aug 10, 2011 at 11:05 AM, Tom Mortimer t...@flax.co.uk wrote: Hi, Apologies if this is really basic. I'm trying to learn

Re: how to ignore case in solr search field?

2011-08-10 Thread Tom Mortimer
You can use solr.LowerCaseFilterFactory in an analyser chain for both indexing and queries. The schema.xml supplied with example has several field types using this (including text_general). Tom On 10 August 2011 16:42, nagarjuna nagarjuna.avul...@gmail.com wrote: Hi please help me

Re: Error loading a custom request handler in Solr 4.0

2011-08-10 Thread Tom Mortimer
Interesting.. is this in trunk (4.0)? Maybe I've broken mine somehow! What classpath did you use for compiling? And did you copy anything other than the new jar into lib/ ? thanks, Tom On 10 August 2011 18:07, simon mtnes...@gmail.com wrote: It's working for me. Compiled, inserted in solr

Re: Error loading a custom request handler in Solr 4.0

2011-08-10 Thread Tom Mortimer
Thanks Simon. I'll try again tomorrow. Tom On 10 August 2011 18:46, simon mtnes...@gmail.com wrote: This is in trunk (up to date). Compiler is 1.6.0_26 classpath was dist/apache-solr-solrj-4.0-SNAPSHOT.jar:dist/apache-solr-core-4.0-SNAPSHOT.jar built from trunk just prior by 'ant dist

RE: performance crossover between single index and sharding

2011-08-02 Thread Burton-West, Tom
so many shards that the overhead of distributing the queries, and consolidating/merging the responses becomes a serious issue. Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search * http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-50-volumes-5

RE: performance crossover between single index and sharding

2011-08-02 Thread Burton-West, Tom
machines:) Tom -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Tuesday, August 02, 2011 2:12 PM To: solr-user@lucene.apache.org Subject: Re: performance crossover between single index and sharding Hi Tom, Very interesting indeed! But i keep wondering why some

Re: Nightly builds

2011-07-06 Thread Tom Gross
, for that! On Tue, Jul 5, 2011 at 10:19 AM, Tom Grossitconse...@gmail.com wrote: On 07/05/2011 04:08 PM, Benson Margulies wrote: The solr download link does not point to or mention nightly builds. Are they out there? http://lmgtfy.com/?q=%2Bsolr+%2Bnightlybuildsl=1 -- Auther of the book Plone

Re: Nightly builds

2011-07-05 Thread Tom Gross
On 07/05/2011 04:08 PM, Benson Margulies wrote: The solr download link does not point to or mention nightly builds. Are they out there? http://lmgtfy.com/?q=%2Bsolr+%2Bnightlybuildsl=1 -- Auther of the book Plone 3 Multimedia - http://amzn.to/dtrp0C Tom Gross email.@toms

RE: what s the optimum size of SOLR indexes

2011-07-05 Thread Burton-West, Tom
(but 99th percentile times of about 2 seconds). Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search

RE: Garbage Collection: I have given bad advice in the past!

2011-06-24 Thread Burton-West, Tom
of the JVM you are using? Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search

Re: char sets accepted via xml

2011-06-22 Thread Tom Gross
Hi, I also have this issue with Solr 3.2.0. It is probably this: https://issues.apache.org/jira/browse/SOLR-2381 Tom On 06/15/2011 02:09 PM, Mark Cunningham wrote: Hi, If you submit information to solr using xml, does the server assume you're using unicode encoded in utf8? And does it accept

RE: huge shards (300GB each) and load balancing

2011-06-15 Thread Burton-West, Tom
give any details about the number of shards per machine and the total memory on the machine. Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search From: Dmitry Kan [dmitry@gmail.com] Sent: Tuesday, June 14, 2011 2:15 PM To: solr-user

RE: FastVectorHighlighter and hl.fragsize parameter set to zero causes exception

2011-06-11 Thread Burton-West, Tom
Thank you Koji, I'll take a look at SingleFragListBuilder, LUCENE-2464, and SOLR-1985, and I will update the wiki on Monday. Tom There is SingleFragListBuilder for this purpose. Please see: https://issues.apache.org/jira/browse/LUCENE-2464 3

Re: Indexing data from multiple datasources

2011-06-10 Thread Tom Gross
? Thanks in advance Greg -- Auther of the book Plone 3 Multimedia - http://amzn.to/dtrp0C Tom Gross email.@toms-projekte.de skype.tom_gross web.http://toms-projekte.de blog...http://blog.toms-projekte.de

FastVectorHighlighter and hl.fragsize parameter set to zero causes exception

2011-06-10 Thread Burton-West, Tom
indicating whether they apply to only the regular highlighter or the FVH? Tom Burton-West

RE: Does MultiTerm highlighting work with the fastVectorHighlighter?

2011-06-09 Thread Burton-West, Tom
Hi Koji, Thank you for your reply. It is the feature of FVH. FVH supports TermQuery, PhraseQuery, BooleanQuery and DisjunctionMaxQuery and Query constructed by those queries. Sorry, I'm not sure I understand. Are you saying that FVH supports MultiTerm highlighting? Tom

RE: huge shards (300GB each) and load balancing

2011-06-08 Thread Burton-West, Tom
://www.hathitrust.org/blogs/large-scale-search/too-many-words-again for details) We later ran into memory problems when indexing so instead changed the index time parameter termIndexInterval from 128 to 1024. (More details here: http://www.hathitrust.org/blogs/large-scale-search) Tom Burton-West

Does MultiTerm highlighting work with the fastVectorHighlighter?

2011-06-08 Thread Burton-West, Tom
. Tom Burton-West query str name=qocr:tink*/str highlighting params: str name=hl.highlightMultiTermtrue/str str name=hl.fragsize200/str str name=hl.useFastVectorHighlightertrue/str str name=hl.snippets200/str str name=hl.fragmentsBuildercolored/str str name=hl.fragListBuildersimple/str str name

RE: Does MultiTerm highlighting work with the fastVectorHighlighter?

2011-06-08 Thread Burton-West, Tom
using the fastVectorHighLighter as long as we don't do a MultiTerm query. For example see the query and results appended below (using the same hl parameters listed in the previous email) Tom str name=qocr:tinkham/str lst name=highlighting

RE: 400 MB Fields

2011-06-07 Thread Burton-West, Tom
of the text, so I would suspect even with the largest ramBufferSizeMB, you might run into problems. (This is with the 3.x branch. Trunk might not have this problem since it's much more memory efficient when indexing Tom Burton-West www.hathitrust.org/blogs

filter cache and negative filter query

2011-05-17 Thread Burton-West, Tom
query against the index for not history? Tom

Re: Custom filter development

2011-05-09 Thread Tom Hill
mapping to b - c too, and I want to have the possibility to remove a Token d - . How can I do this, when the next methods returns only one Token, not a collection? Buffer them internally. Look at SynonymFilter.java, it does exactly this. Tom Thanks! -- View this message in context: http

RE: CommonGrams indexing very slow!

2011-04-27 Thread Burton-West, Tom
settings? Tom All, We have created index with CommonGrams and the final size is around 370GB. Everything is working fine but now when we add more documents into index it takes forever (almost 12 hours)...seems to change all the segments file in a commit. The same commit used

RE: CommonGrams indexing very slow!

2011-04-27 Thread Burton-West, Tom
Hi Salman, We had a similar problem with the IndexMergeTool in Lucene contrib. I seem to remember having to hack the IndexMergeTool code so that it wouldn't create the CFF automatically. Let me know if you need it and I'll dig up the modified code. Tom. -Original Message- From

RE: TermsCompoment + Dist. Search + Large Index + HEAP SPACE

2011-04-26 Thread Burton-West, Tom
a lot of memory. http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/misc/src/java/org/apache/lucene/misc/HighFreqTerms.java?view=log Tom http://www.hathitrust.org/blogs/large-scale-search -Original Message- From: mdz-munich [mailto:sebastian.lu...@bsb-muenchen.de

RE: QUESTION: SOLR INDEX BIG FILE SIZES

2011-04-18 Thread Burton-West, Tom
double name=maxMergeMB200/double /mergePolicy In the flexible indexing branch/trunk there is a new merge policy and parameter that allows you to set the maximum size of the merged segment: https://issues.apache.org/jira/browse/LUCENE-854. Tom Burton-West http://www.hathitrust.org/blogs/large

RE: Understanding the DisMax tie parameter

2011-04-15 Thread Burton-West, Tom
Thanks everyone. I updated the wiki. If you have a chance please take a look and check to make sure I got it right on the wiki. http://wiki.apache.org/solr/DisMaxQParserPlugin#tie_.28Tie_breaker.29 Tom -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent

Understanding the DisMax tie parameter

2011-04-14 Thread Burton-West, Tom
is the sum of the sub scores. Typically a low value (ie: 0.1) is useful. Tom Burton-West

RE: ArrayIndexOutOfBoundsException with facet query

2011-04-11 Thread Burton-West, Tom
with regular searches against the index or with other facet queries. Only with this facet. Is TermInfoAndOrd only used for faceting? I'll go ahead and build the patch and let you know. Tom p.s. Here is the field definition: field name=topicStr type=string indexed=true stored=false multiValued

RE: ArrayIndexOutOfBoundsException with facet query

2011-04-11 Thread Burton-West, Tom
some number to trigger the bug? I rebuilt lucene-core-3.1-SNAPSHOT.jar with your patch and it fixes the problem. Tom -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Monday, April 11, 2011 1:00 PM To: Burton-West, Tom Cc: solr-user

ArrayIndexOutOfBoundsException with facet query

2011-04-08 Thread Burton-West, Tom
,bigTerms=0,termInstances=1368694,uses=0} Apr 8, 2011 2:01:58 PM org.apache.solr.core.SolrCore execute Is this a known bug? Can anyone provide a clue as to how we can determine what the problem is? Tom Burton-West Appended Below is the exception stack trace: SEVERE: Exception during facet.field

Highlighting not working

2011-04-07 Thread Tom Mortimer
name=dfcv_text_en/str str name=dfcv_text_de/str ... str name=hlon/str str name=hl.flcv_text/str /lst I've tried playing with other hl. parameters, but have had no luck so far. Any ideas? thanks, Tom

Re: Highlighting not working

2011-04-07 Thread Tom Mortimer
I guess what I'm asking is - can Solr highlight non-indexed fields? Tom On 7 April 2011 11:33, Tom Mortimer t...@flax.co.uk wrote: Hi, I'm having trouble getting highlighting to work for a large text field. This field can be in several languages, so I'm sending it to one of several fields

Re: Highlighting not working

2011-04-07 Thread Tom Mortimer
Problem solved. *bangs head on desk* T On 7 April 2011 11:33, Tom Mortimer t...@flax.co.uk wrote: Hi, I'm having trouble getting highlighting to work for a large text field. This field can be in several languages, so I'm sending it to one of several fields configured appropriately (e.g

copyField at search time / multi-language support

2011-03-28 Thread Tom Mortimer
subclass solr.SearchHandler? I know nothing about Solr internals at the moment... thanks, Tom

RE: Using Solr over Lucene effects performance?

2011-03-14 Thread Burton-West, Tom
if the index is large enough so disk I/O is a factor. Tom -Original Message- From: Glen Newton [mailto:glen.new...@gmail.com] Sent: Friday, March 11, 2011 5:28 PM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Cc: sivaram Subject: Re: Using Solr over Lucene effects performance? I

RE: How to handle searches across traditional and simplifies Chinese?

2011-03-08 Thread Burton-West, Tom
This page discusses the reasons why it's not a simple one to one mapping http://www.kanji.org/cjk/c2c/c2cbasis.htm Tom -Original Message- I have documents that contain both simplified and traditional Chinese characters. Is there any way to search across them? For example, if someone

Solr indexing socket timeout errors

2011-01-07 Thread Burton-West, Tom
we might look to determine the cause? Tom Tom Burton-West Jan 7, 2011 2:31:07 AM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: [was class java.net.SocketTimeoutException] Read timed out at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException

RE: Memory use during merges (OOM)

2010-12-18 Thread Burton-West, Tom
and SolrIndexConfig trying to better understand how solrconfig.xml gets instantiated and how it affects the readers and writers. Tom From: Robert Muir [rcm...@gmail.com] On Thu, Dec 16, 2010 at 4:03 PM, Burton-West, Tom tburt...@umich.edu wrote: Your

RE: Memory use during merges (OOM)

2010-12-16 Thread Burton-West, Tom
will be in proportion to the net size of the merge (mergeFactor + how big each merged segment is), how many merges you allow concurrently, and whether you do false or true deletions Does an optimize do something differently? Tom

RE: Memory use during merges (OOM)

2010-12-16 Thread Burton-West, Tom
. Is the use during merging the similar to the use during searching? i.e. Some process has to look up data for a particular term as opposed to having to iterate through all the terms? (Haven't yet dug into the merging/indexing code). Tom -Original Message- From: Robert Muir

Memory use during merges (OOM)

2010-12-15 Thread Burton-West, Tom
in terms of the number or size of segments? Our largest segments prior to the failed merge attempt were between 5GB and 30GB. The memory allocated to the Solr/tomcat JVM is 10GB. Tom Burton-West - Changes to indexing configuration

access to environment variables in solrconfig.xml and/or schema.xml?

2010-12-13 Thread Burton-West, Tom
everything have to be stuffed into a java system property? Tom Burton-West

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-10 Thread Tom Hill
or unique terms? You might check your faceting algorithms, and see if you could use enum, instead of fc for some of them. Check your statistics page, what's your insanity count? Tom On Fri, Dec 10, 2010 at 12:17 PM, John Russell jjruss...@gmail.com wrote: I have been load testing solr 1.4.1

Re: singular/plurals

2010-12-10 Thread Tom Hill
Check out this page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Look, in particular, for stemming. On Fri, Dec 10, 2010 at 7:58 PM, Jack O jack_...@yahoo.com wrote: Hello, Need one more help: What do I have to do so that search will work for singulars and plurals ? I

Re: command line parameters for solr

2010-12-10 Thread Tom Hill
java -jar start.jar --help More docs here http://docs.codehaus.org/display/JETTY/A+look+at+the+start.jar+mechanism Personally, I usually limit access to localhost by using whatever firewall the machine uses. Tom On Fri, Dec 10, 2010 at 7:55 PM, Jack O jack_...@yahoo.com wrote: Hello

Re: Delete by query or Id very slow

2010-12-09 Thread Tom Hill
failed, if any do, if you delete with a list, but you are not using unsuccessful now anyway. Tom On Thu, Dec 9, 2010 at 7:55 AM, Ravi Kiran ravi.bhas...@gmail.com wrote: Thank you Tom for responding. On an average the docs are around 25-35 KB. The code is as follows, Kindly let me know if you see

Re: Triggering a reload of replicated configuration files

2010-12-09 Thread Tom Hill
? Tom

Re: How badly does NTFS file fragmentation impact search performance? 1.1X? 10X? 100X?

2010-12-08 Thread Tom Hill
If you can benchmark before and after, please post the results when you are done! Things like your index's size, and the amount of RAM in your computer will help make it meaningful. If all of your index can be cached, I don't think fragmentation is going matter much, once you get warmed up. Tom

Re: Delete by query or Id very slow

2010-12-08 Thread Tom Hill
), and they deleted quickly (17 milliseconds). Maybe if you post your delete code? Are you doing anything else (like commit/optimize?) Tom On Wed, Dec 8, 2010 at 12:55 PM, Ravi Kiran ravi.bhas...@gmail.com wrote: Hello,             Iam using solr 1.4.1 when I delete by query or Id from solrj

Re: only index synonyms

2010-12-07 Thread Tom Hill
#solr.KeepWordFilterFactory I'd put the synonym filter first in your configuration for the field, then the keep words filter factory. Tom On Tue, Dec 7, 2010 at 12:06 PM, lee carroll lee.a.carr...@googlemail.com wrote: ok thanks for your response To summarise the solution then: To only index synonyms

Re: customer ping response

2010-12-07 Thread Tom Hill
. But it's trivial to do. So, I wouldn't recommend it, but it was fun to play around with. :) It's probably easier to fix the load balancer, which is almost certainly just looking for any string you specify. Just change what it's expecting. They are built so you can configure this. Tom On Tue, Dec

Re: complex boolean filtering in fq queries

2010-12-07 Thread Tom Hill
is will be (city:San) (Francisco) Probably not what you want. 3) Will complex boolean filters like this substantially slow down query performance? That's not very complex, and the filter may be cached. Probably won't be a problem. Tom Thanks

Re: Index version on slave nodes

2010-12-07 Thread Tom Hill
happens if you configure your slave as a master, also? Does that get the behavior you want? Tom On Tue, Dec 7, 2010 at 8:16 AM, Markus Jelsma markus.jel...@openindex.io wrote: Yes, i read that too in the replication request handler's source comments. But i would find it convenient if it would just

Re: only index synonyms

2010-12-06 Thread Tom Hill
said, use the = syntax. You've already got it. Add the lines pretty = scenic text = words to synonyms.txt, and it will do what you want. Tom On 7 Dec 2010 01:28, Erick Erickson erickerick...@gmail.com wrote: See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Re: Restrict access to localhost

2010-12-03 Thread Tom
If you are using another app to create the index, I think you can remove the update servlet mapping in the web.xml. -- View this message in context: http://lucene.472066.n3.nabble.com/Restrict-access-to-localhost-tp2004475p2014129.html Sent from the Solr - User mailing list archive at

RE: ramBufferSizeMB not reflected in segment sizes in index

2010-12-02 Thread Burton-West, Tom
, 2010 5:40:33 PM IW 0 [Wed Dec 01 17:40:33 EST 2010; http-8091-Processor12]: flushedFiles=[_5h.frq, _5h.tis, _5h.prx, _5h.nrm, _5h.fnm, _5h.tii] Tom -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, December 01, 2010 3:43 PM To: solr

Solr 3x segments file and deleting index

2010-12-01 Thread Burton-West, Tom
and then restart Solr. Is this a feature or a bug? What is the rationale? Tom Tom Burton-West

ramBufferSizeMB not reflected in segment sizes in index

2010-12-01 Thread Burton-West, Tom
Tom Burton-West

RE: ramBufferSizeMB not reflected in segment sizes in index

2010-12-01 Thread Burton-West, Tom
on the production indexer. If it doesn't I'll turn it on and post here. Tom -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, December 01, 2010 2:43 PM To: solr-user@lucene.apache.org Subject: Re: ramBufferSizeMB not reflected in segment sizes

RE: Doubt about index size

2010-11-12 Thread Burton-West, Tom
of them are marked as deleted numDocs is the actual number of undeleted documents If you run an optimize the index will be rewritten, the index size will go down and numDocs will equal maxDocs Tom Burton-West -Original Message- From: Claudio Devecchi [mailto:cdevec...@gmail.com] Sent

RE: Doubt about index size

2010-11-12 Thread Burton-West, Tom
An optimize takes lots of cpu and I/O since it has to rewrite your indexes, so only do it when necessary. You can just use curl to send an optimize message to Solr when you are ready. See: http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_parameters_as_part_of_the_URL Tom

Using ICUTokenizerFilter or StandardAnalyzer with UAX#29 support from Solr

2010-11-01 Thread Burton-West, Tom
of writing the appropriate Solr filter factories? Are there any tricky gotchas in writing such a filter? If so, should I open a JIRA issue or two JIRA issues so the filter factories can be contributed to the Solr code base? Tom

RE: Using ICUTokenizerFilter or StandardAnalyzer with UAX#29 support from Solr

2010-11-01 Thread Burton-West, Tom
me to it :) Tom -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, November 01, 2010 12:49 PM To: solr-user@lucene.apache.org Subject: Re: Using ICUTokenizerFilter or StandardAnalyzer with UAX#29 support from Solr On Mon, Nov 1, 2010 at 12:24 PM, Burton-West

filter query from external list of Solr unique IDs

2010-10-15 Thread Burton-West, Tom
Hoss or someone else point me to more detailed information on what might be involved in the two ideas listed above? Is somehow keeping an up-to-date map of unique Solr ids to internal Lucene ids needed to implement this or is that a separate issue? Tom Burton-West http://www.hathitrust.org/blogs

RE: filter query from external list of Solr unique IDs

2010-10-15 Thread Burton-West, Tom
after we optimize an index and before we mount it in production. In our workflow, we update the index and optimize it before we release it and once it is released to production there is no indexing/merging taking place on the production index (so the internal Lucene ids don't change.) Tom

RE: filter query from external list of Solr unique IDs

2010-10-15 Thread Burton-West, Tom
Thanks Yonik, Is this something you might have time to throw together, or an outline of what needs to be thrown together? Is this something that should be asked on the developer's list or discussed in SOLR 1715 or does it make the most sense to keep the discussion in this thread? Tom

RE: Experience with large merge factors

2010-10-06 Thread Burton-West, Tom
to look at the code when I get back to make sure. We aren't using term vectors now, but we plan to add them as well as a number of fields based on MARC (cataloging) metadata in the future. Tom

Experience with large merge factors

2010-10-05 Thread Burton-West, Tom
optimum mergeFactor somewhere between 0 (noMerge merge policy) and 1,000. (We are also planning to raise the ramBufferSizeMB significantly). What experience do others have using a large mergeFactor? Tom

Estimating memory use for Solr caches

2010-10-01 Thread Burton-West, Tom
assume these are Java ints but the number depends on the number of hits. Is there a good way to estimate (or measure:) the size of this in memory? Tom Burton-West

RE: bi-grams for common terms - any analyzers do that?

2010-09-27 Thread Burton-West, Tom
as a phrase query. Tom Burton-West

RE: bi-grams for common terms - any analyzers do that?

2010-09-27 Thread Burton-West, Tom
necessarily using the Boolean OR operator? i.e. if solrQueryParser defaultOperator=AND/ and autoGeneratePhraseQueries = off then IndexReader - index reader - index AND reader Tom

Re: Need help with spellcheck city name

2010-09-27 Thread Tom Hill
Maybe process the city name as a single token? On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: Hi,   I have city name as a text field, and I want to do spellcheck on it.  I use setting in http://wiki.apache.org/solr/SpellCheckComponent If I setup city

RE: bi-grams for common terms - any analyzers do that?

2010-09-23 Thread Burton-West, Tom
Lucene-2458 is working on a better fix. Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search fieldType name=CommonGramTest class=solr.TextField positionIncrementGap=100 − analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class

Re: Delete Dynamic Fields

2010-09-22 Thread Tom Hill
Delete all docs with the dynamic fields, and then optimize. On Wed, Sep 22, 2010 at 1:58 PM, Moiz Bhukhiya moiz.bhukh...@gmail.com wrote: Hi All: I had used dynamic fields for some of my fields and then later decided to make it static. I removed that dynamic field from the schema but I still

Re: Searching solr with a two word query

2010-09-20 Thread Tom Hill
wouldn't use either of the last two. Tom p.s. Not sure what is going on with the last lines of your debug output for the query. Is that really what shows up after presentation ID? I see Euro, hash mark, zero, semi-colon, and H with stroke str name=parsedquery_toString all_text:open +all_text:excel

RE: Solr memory use, jmap and TermInfos/tii

2010-09-13 Thread Burton-West, Tom
provide you with our tii/tis data. I'll let you know as soon as I hear anything. Tom -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Sunday, September 12, 2010 10:48 AM To: solr-user@lucene.apache.org; simon.willna...@gmail.com Subject: Re: Solr memory use, jmap

RE: Solr and jvm Garbage Collection tuning

2010-09-13 Thread Burton-West, Tom
termIndexInterval with Solr 1.4.1 on our test server. Tom -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] .What are your current GC settings? Also, I guess I'd look at ways you can reduce the heap size needed. Caching, field type choices, faceting choices

RE: Solr memory use, jmap and TermInfos/tii

2010-09-11 Thread Burton-West, Tom
prefiltering prior to sending documents to Solr for indexing. However, since with now have over 400 languages , we will have to be conservative in our filtering since we would rather index dirty OCR than risk not indexing legitimate content. Tom

Solr memory use, jmap and TermInfos/tii

2010-09-10 Thread Burton-West, Tom
600,000 full-text books in each shard). In interpreting the jmap output, can we assume that the listings for utf8 character arrays ([C), java.lang.String, long int arrays ([J), and int arrays ([i) are all part of the data structures involved in representing the tii file in memory? Tom Burton

Solr and jvm Garbage Collection tuning

2010-09-10 Thread Burton-West, Tom
Solr is waiting on GC? If we could get the time for each GC to take under a second, with the trade-off being that GC would occur much more frequently, that would help us avoid the occasional query taking more than 30 seconds at the cost of a larger number of queries taking at least a second. Tom

RE: analysis tool vs. reality

2010-08-13 Thread Burton-West, Tom
+1 I just had occasion to debug something where the interaction between the queryparser and the analyzer produced *interesting* results. Having a separate jsp that includes the whole chain (i.e. analyzer/tokenizer/filter and qp) would be great! Tom -Original Message- From: Michael

RE: Improve Query Time For Large Index

2010-08-12 Thread Burton-West, Tom
/explain will indicate whether the parsed query is a PhraseQuery. Tom -Original Message- From: Peter Karich [mailto:peat...@yahoo.de] Sent: Thursday, August 12, 2010 5:36 AM To: solr-user@lucene.apache.org Subject: Re: Improve Query Time For Large Index Hi Tom, I tried again

RE: Improve Query Time For Large Index

2010-08-11 Thread Burton-West, Tom
class=solr.CommonGramsQueryFilterFactory words=new400common.txt/ /analyzer /fieldType Tom -Original Message- From: Peter Karich [mailto:peat...@yahoo.de] Sent: Tuesday, August 10, 2010 3:32 PM To: solr-user@lucene.apache.org Subject: Re: Improve Query Time For Large Index Hi Tom, my

RE: Improve Query Time For Large Index

2010-08-10 Thread Burton-West, Tom
/slow-queries-and-common-words-part-2) Tom Burton-West -Original Message- From: Peter Karich [mailto:peat...@yahoo.de] Sent: Tuesday, August 10, 2010 9:54 AM To: solr-user@lucene.apache.org Subject: Improve Query Time For Large Index Hi, I have 5 Million small documents/tweets (= ~3GB

RE: Quering the database

2010-08-02 Thread Fornoville, Tom
This question has come up several times over the past weeks. The cause is probably all your fields being of type string. This is only good for exact matches like id's etc. Try using text or another type that tokenizes. -Original Message- From: Hando420 [mailto:hando...@gmail.com] Sent:

RE: Good list of English words that get butchered by Porter Stemmer

2010-07-30 Thread Burton-West, Tom
stemmer page: http://snowball.tartarus.org/algorithms/english/stemmer.html Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Friday, July 30, 2010 4:42 PM To: solr-user@lucene.apache.org

RE: Total number of terms in an index?

2010-07-27 Thread Burton-West, Tom
dug in to the code so I don't actually know how the tii file gets loaded into a data structure in memory. If there is api access, it seems like this might be the quickest way to get the number of unique terms. (Of course you would have to do this for each segment). Tom -Original Message

RE: indexing best practices

2010-07-19 Thread Burton-West, Tom
the nomerge merge policy. I hope to have some results to report on our blog sometime in the next month or so. Tom Burton-West www.hathitrust.org/blogs -Original Message- From: kenf_nc [mailto:ken.fos...@realestate.com] Sent: Sunday, July 18, 2010 8:18 AM To: solr-user@lucene.apache.org Subject

RE: How to speed up solr search speed

2010-07-15 Thread Fornoville, Tom
Is there any reason why you have to limit each instance to only 1M documents? If you could put more documents in the same core I think it would dramatically improve your response times. -Original Message- From: marship [mailto:mars...@126.com] Sent: donderdag 15 juli 2010 6:23 To:

RE: CommonsHttpSolrServer add document hangs

2010-07-13 Thread Fornoville, Tom
If you're only adding documents you can also have a go with StreamingUpdateSolrServer instead of the CommonsHttpSolrServer. Couple that with the suggestion of master/slave so the searches don't interfere with the indexing and you should have a pretty responsive system. -Original Message-

RE: Locked Index files

2010-07-13 Thread Fornoville, Tom
Is the Solr process still running? Also what OS are you using? -Original Message- From: ZAROGKIKAS,GIORGOS [mailto:g.zarogki...@multirama.gr] Sent: dinsdag 13 juli 2010 10:47 To: solr-user@lucene.apache.org Subject: RE: Locked Index files I found it but I can not delete Any

<    1   2   3   4   5   >