Re: How to form a boolean query such that it wont evaluate the right hand side if it isn't necessary

2018-02-07 Thread Walter Underwood
doesn’t exist because it isn’t useful. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 7, 2018, at 9:50 AM, bbarani <bbar...@gmail.com> wrote: > > > I am trying to figure out a way to form boolean (||) query in SOLR. > I

Re: Bi Gram token generation with fuzzy searches

2018-02-07 Thread Walter Underwood
I think you need the feature in SOLR-629 that adds fuzzy to edismax. https://issues.apache.org/jira/browse/SOLR-629 The patch on that issue is for Solr 4.x, but I believe someone is working on a new patch. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog

Re: 7.2.1 cluster dies within minutes after restart

2018-02-02 Thread Walter Underwood
Zookeeper 3.4.6 is not good? That was the version recommended by Solr docs when I installed 6.2.0. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 2, 2018, at 9:30 AM, Markus Jelsma <markus.jel...@openindex.io> wrote: > > Hel

Re: Request node status independently

2018-02-01 Thread Walter Underwood
do a search to see if a collection is ready. If a search for “q=*:*=0” returns OK, then I’ll send traffic to that node. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 1, 2018, at 8:35 AM, Erick Erickson <erickerick...@gmail.com&

Re: Title Search scoring issues with multivalued field & norm

2018-01-31 Thread Walter Underwood
a translation to “plus/minus” before indexing or querying. Query completion made a huge difference, taking our clickthrough rate from 0.45 to 0.55. Later, we added fuzzy search to handle misspellings. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog

Re: Title Search scoring issues with multivalued field & norm

2018-01-31 Thread Walter Underwood
a popularity score as a boost. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote: > > Hi, > We are using solr for our movie title search. > > > As it is

Re: SolrClient#updateByQuery?

2018-01-26 Thread Walter Underwood
Use a filter query to filter out all the documents marked deleted. Don’t use “expunge deletes”, it does more than you want because it forces a merge. Just commit after sending the delete. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan

Re: Guideline on when a field absolutely needs to be stored?

2018-01-17 Thread Walter Underwood
There is a nice table for all of the field options. https://lucene.apache.org/solr/guide/7_2/field-properties-by-use-case.html wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 17, 2018, at 11:23 PM, Clemens Wyss DEV <clemens...@mysign.ch&

Re: Deliver static html content via solr

2018-01-04 Thread Walter Underwood
to fetch blobs by ID and don’t want to use a filesystem, use a database designed for that. That was the original focus of MySQL, for example. Solr is not a database. Solr is not a repository. A design using Solr for primary storage of data is a broken design. wunder Walter Underwood wun

Re: Always use leader for searching queries

2018-01-03 Thread Walter Underwood
Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 3, 2018, at 8:58 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > [I probably not need to do this because I have only one shard but I did > anyway count was different.] > > Th

Re: SolrJ with Async Http Client

2018-01-03 Thread Walter Underwood
HTTPClient is non-blocking. Send the request, then the client gets control back. It only blocks when you do the read. So one thread can send multiple requests then check for each response. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Ja

Re: Scaling issue with Solr

2017-12-27 Thread Walter Underwood
Solr to keep up with ES in log search features. Likewise, I would not expect ES to keep up with Solr for product and text search features. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 27, 2017, at 1:33 PM, Erick Erickson <erickerick...@gma

Re: solrcloud through aws elb

2017-12-26 Thread Walter Underwood
sounds like a terrible idea. Use HTTP. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 26, 2017, at 1:05 PM, Rick Leir <rl...@leirtech.com> wrote: > > Per, > This is more of a question for the Drupal folks. But in passing,

Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter

2017-12-21 Thread Walter Underwood
makes the query much larger and much slower. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 21, 2017, at 6:28 AM, Markus Jelsma <markus.jel...@openindex.io> wrote: > > Hello Steve, > > Well, that is an interesting approach to

Re: OOM spreads to other replica's/HA when OOM

2017-12-19 Thread Walter Underwood
affic with that. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: legacy replication

2017-12-15 Thread Walter Underwood
is faster, we’re handling double the query volume with 3X the docs. Sorry for the rant, but it has not been a good fall semester for our students (customers). Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 15, 2017, at 9:46 AM, Erick Erickson <eric

Re: Alternatives to tika for extracting text out of PDFs

2017-12-07 Thread Walter Underwood
hamburger back into a cow. The PDF standard has improved a lot, but then you get an OCR’ed PDF. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 7, 2017, at 5:29 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > I'm goin

Re: Logging in Solrcloud

2017-12-05 Thread Walter Underwood
s=0=2=true=jack+and+jill+are+maneuvering+a+2800+kg+boat+near+a+dock.+initially+the+boat%27s+position+is+m+and+its+speed+is+1.9+m%2Fs.+as+the+boa In your case, “gettingstarted_shard1_replica_n2” should mean that is an intra-cluster request. Also, “distrib=false” means it is for a single core. wunder W

Java profiler?

2017-12-05 Thread Walter Underwood
Anybody have a favorite profiler to use with Solr? I’ve been asked to look at why out queries are slow on a detail level. Personally, I think they are slow because they are so long, up to 40 terms. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Logging in Solrcloud

2017-12-05 Thread Walter Underwood
with a local nginx server. That will allow us to limit concurrent connections. It will also give us a log of just the client requests. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 5, 2017, at 4:25 AM, Matzdorf, Stefan, Springer SBM

Re: Multiple cores versus a "source" field.

2017-12-04 Thread Walter Underwood
Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 4, 2017, at 7:17 PM, Phil Scadden <p.scad...@gns.cri.nz> wrote: > > Thanks Eric. I have already followed the solrj indexing very closely - I have > to do a lot of manipulation at indexi

Re: Solr JVM best pratices

2017-12-02 Thread Walter Underwood
flag was because something was invoking a full GC to get accurate memory usage. That was annoying. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 2, 2017, at 8:18 AM, Dominique Bejean <dominique.bej...@eolya.fr> > wrote: &g

Re: Solr JVM best pratices

2017-12-02 Thread Walter Underwood
We use an 8G heap and G1 with Shawn Heisey’s settings. Java 8, update 131. This has been solid in production with a 32 node Solr Cloud cluster. We do not do faceting. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 2, 2017, at 7:43

Re: Skewed IDF in multi lingual index, again

2017-11-30 Thread Walter Underwood
Expanding the query to use both the tagged and untagged term might work. I’m not sure the effect would be a lot different than boosting the preferred language. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 30, 2017, at 8:35 AM, Markus Jel

Re: Skewed IDF in multi lingual index, again

2017-11-30 Thread Walter Underwood
. If the entire document is in one language, might as well use a filter query for that language. The tags would work for multiple languages in one document. Maybe make the untagged term a synonym. For cross-language terms like “LaserJet”, the untagged one would have worse idf. wunder Walter

Re: OutOfMemoryError in 6.5.1

2017-11-29 Thread Walter Underwood
. Connections are just blocks of data in the client and OS. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 29, 2017, at 3:41 PM, Toke Eskildsen <t...@kb.dk> wrote: > > Walter Underwood <wun...@wunderwood.org> wrote: >> I kn

Re: OutOfMemoryError in 6.5.1

2017-11-29 Thread Walter Underwood
. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 29, 2017, at 8:38 AM, Toke Eskildsen <t...@kb.dk> wrote: > > Walter Underwood <wun...@wunderwood.org> wrote: >> I set this in jetty.xml, but it

Re: OutOfMemoryError in 6.5.1

2017-11-28 Thread Walter Underwood
I’m pretty sure these OOMs are caused by uncontrolled thread creation, up to 4000 threads. That requires an additional 4 Gb (1 Meg per thread). It is like Solr doesn’t use thread pools at all. I set this in jetty.xml, but it still created 4000 threads. wunder Walter Underwood wun

Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Walter Underwood
than the disease. We’ll run another load benchmark with thread max at something realistic, like 200. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 21, 2017, at 8:17 AM, Walter Underwood <wun...@wunderwood.org> wrote: > > All our

Re: OutOfMemoryError in 6.5.1

2017-11-21 Thread Walter Underwood
--module=http I’m still confused why we are hitting OOM in 6.5.1 but weren’t in 6.3.0. Our load benchmarks use prod logs. We added suggesters, but those use analyzing infix, so they are search indexes, not in-memory. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my

OutOfMemoryError in 6.5.1

2017-11-20 Thread Walter Underwood
he process goes to the bad place, then we need to wait until someone is paged and kills it manually. Luckily, it usually drops out of the live nodes for each collection and doesn’t take user traffic. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Do i need to reindex after changing similarity setting

2017-11-20 Thread Walter Underwood
Similarity is query time. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 20, 2017, at 4:57 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote: > > Hi, > > I want to switch to Classic similarity instead of BM25 (default i

Re: External file field

2017-11-17 Thread Walter Underwood
Thanks. I found this, which is much more clear than the manual. http://www.openjems.com/solr-external-file-fields/ The Solr manual should include the info about how to declare the field. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 17, 2

External file field

2017-11-17 Thread Walter Underwood
Do I need to define a field with when I use an external file field? I see the to define it, but the docs don’t say how to define the field. The docs say that the file uses the fieldname as part of the filename, but the directive defines a type name, not a field name. Right? wunder Walter

Re: Anyone have any comments on current solr monitoring favorites?

2017-11-06 Thread Walter Underwood
(orders) and other stuff that is currently in Graphite. We’ll almost certainly move all that to InfluxDB and Grafana. The Solr metrics were overloading the Graphite database, so we’re the first service that is trying InfluxDB. wunder Walter Underwood wun...@wunderwood.org http

Re: Anyone have any comments on current solr monitoring favorites?

2017-11-06 Thread Walter Underwood
Look back down the string to my post. We use Grafana. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 6, 2017, at 11:23 AM, Petersen, Robert (Contr) > <robert.peters...@ftr.com> wrote: > > Interesting! Finally a Grafana use

Re: Anyone have any comments on current solr monitoring favorites?

2017-11-02 Thread Walter Underwood
HTTP response codes. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 2, 2017, at 9:30 AM, Petersen, Robert (Contr) > <robert.peters...@ftr.com> wrote: > > OK I'm probably going to open a can of worms here... lol > > >

Re: E-Commerce Search: tf-idf, tie-break and boolean model

2017-10-20 Thread Walter Underwood
nt 1.0 seems to perform better. > But not sure why. > > I want try to add some relevant fields (tags, categories) in order to the > have more chances to match the correct results. > > Best regards, > Vincenzo > > On Tue, Oct 17, 2017 at 11:38 PM, Walter Underwood <wun.

Re: Deploy Solr to production: best practices

2017-10-19 Thread Walter Underwood
eG1GC \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=8m \ -XX:MaxGCPauseMillis=200 \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 18, 2017, at 10:32 PM, maximka19 <moldabeko...@gmail.com>

Re: Jetty maxThreads

2017-10-18 Thread Walter Underwood
Linux/x64 (64-bit): 1024 KB OS X (64-bit): 1024 KB Oracle Solaris/i386 (32-bit): 320 KB Oracle Solaris/x64 (64-bit): 1024 KB wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 18, 2017, at 1:44 PM, Walter Underwood <wun...@wunderwood.org&

Re: Jetty maxThreads

2017-10-18 Thread Walter Underwood
With an 8GB heap, I’d like to keep thread stack memory to 2GB or under, which means a maxThreads of 1000. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 18, 2017, at 1:41 PM, Walter Underwood <wun...@wunderwood.org> wrote: &

Jetty maxThreads

2017-10-18 Thread Walter Underwood
Jetty maxThreads is set to 10,000 which seams way too big. The comment suggests 5X the number of CPUs. We have 36 CPUs, which would mean 180 threads, which seems more reasonable. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: E-Commerce Search: tf-idf, tie-break and boolean model

2017-10-17 Thread Walter Underwood
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 16, 2017, at 1:53 AM, Emir Arnautović <emir.arnauto...@sematext.com> > wrote: > > Hi Vincenzo, > Unless you have really specific ranking requirements, I would not suggest you

Re: Need help with Slow Query Logging

2017-10-17 Thread Walter Underwood
I would not do this in Solr. Post process the log file to split them out. That allows you to change the definition of “slow” later, reprocess older files, etc. Do log analysis with log analysis tools. Don’t try to push that too far up the chain into the production server. wunder Walter

Re: Seeing very low ingestion performance for a single non-cloud Solr core

2017-09-21 Thread Walter Underwood
million/minute. We are indexing bigger documents, but seeing 1 million/minute to a cluster with four shards. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 21, 2017, at 1:18 AM, Emir Arnautović <emir.arnauto...@sematext.com> > wr

Re: Replicates not recovering after rolling restart

2017-09-20 Thread Walter Underwood
> On Sep 20, 2017, at 6:15 PM, Bill Oconnor <bocon...@plos.org> wrote: > > I restart using the standard "sudo service solr start/stop" You might look into what that actually does. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Replicates not recovering after rolling restart

2017-09-20 Thread Walter Underwood
1578578283947098112 needs 61 bits. Is it being parsed into a 32 bit target? That doesn’t explain where it came from, of course. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 20, 2017, at 3:35 PM, Erick Erickson <erickerick...@gma

Re: Replicates not recovering after rolling restart

2017-09-20 Thread Walter Underwood
start -cloud -h `hostname`' done Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 20, 2017, at 1:42 PM, Bill Oconnor <bocon...@plos.org> wrote: > > Hello, > > > Background: > > > We have been successfully using Solr for o

Re: Using SOLR J 5.5.4 with SOLR 6.5

2017-09-19 Thread Walter Underwood
As I understand it, any node in the cluster will direct the document to the leader for the appropriate shard. Works for us. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 19, 2017, at 9:59 AM, David Hastings <hastings.recurs...@gma

Re: Using SOLR J 5.5.4 with SOLR 6.5

2017-09-19 Thread Walter Underwood
Yes, good old HTTP. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 19, 2017, at 9:54 AM, David Hastings <hastings.recurs...@gmail.com> > wrote: > > Do you use HttpSolrClient then? > > On Tue, Sep 19, 2017 at 12:26 P

Re: Using SOLR J 5.5.4 with SOLR 6.5

2017-09-19 Thread Walter Underwood
Cloud cluster get the right docs to the right shards. That runs at 1 million docs/minute, so it isn’t worth doing anything fancier. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 19, 2017, at 9:05 AM, David Hastings <hastings.recurs...@gma

Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-19 Thread Walter Underwood
. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 19, 2017, at 12:18 AM, Toke Eskildsen <t...@kb.dk> wrote: > > On Mon, 2017-09-18 at 20:47 -0700, shamik wrote: >> I did bring down the heap size to 8gb, changed to G1 and

Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-18 Thread Walter Underwood
29G on a 30G machine is still a bad config. That leaves no space for the OS, file buffers, or any other processes. Try with 8G. Also, give us some information about the number of docs, size of the indexes, and the kinds of search features you are using. wunder Walter Underwood wun

Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-18 Thread Walter Underwood
Millis=200 \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 18, 2017, at 7:24 AM, Shamik Bandopadhyay <sham...@gmail.com> wrote: > > Hi, > > I recently upgraded

Re: if exists in an fq

2017-09-13 Thread Walter Underwood
How about doing that logic at index time? Make a new field, then copy into it with that logic using an update request processor. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 12, 2017, at 2:05 PM, Peter Kirk <p...@alpha-solutions.dk&

Re: Solr list operator

2017-09-12 Thread Walter Underwood
don’t know anything about the ColdFusion API. I last looked at ColdFusion in 1997. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 12, 2017, at 6:21 AM, Nick Way <n...@southeastpublishing.com> wrote: > > Thank you very mu

Re: Latest stable SOLR version

2017-09-11 Thread Walter Underwood
We have been running 6.5.1 in production since May. I would not run anything before that. The new metrics code caused performance problems. That was fixed in 6.5.0. There was a memory leak talking to Zookeeper. That was fixed in 6.5.1. Solr 6.6.1 should be released very soon. wunder Walter

Re: Solr Commit Thread Blocked because of excessive number of merging threads

2017-09-07 Thread Walter Underwood
. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 7, 2017, at 5:02 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > Skimming and to add to what Shawn said about ramBufferSizeMB. > > It's totally wasted space pretty

Re: Solr list operator

2017-09-06 Thread Walter Underwood
Use a multivalued field. Search for listOfIds:1. Or search for listOfIds:33. This is one of the simplest things that Solr can do. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 6, 2017, at 6:07 AM, Susheel Kumar <susheel2...@gmail.com&

Re: unordered autocomplete search

2017-09-04 Thread Walter Underwood
This should probably be a feature of the analyzing infix suggester. Right now, the fuzzy suggester is broken with the file dictionary, so we can’t use fuzzy suggestions at all. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 4, 2017, at 4

Re: Performance Test

2017-09-04 Thread Walter Underwood
That is what I do. Use production logs. I have a JMeter script that sets a constant request rate. Before each load benchmark, I reload the collection to clear the caches, then run 2000 warming queries from the logs. After that, I start the benchmark. wunder Walter Underwood wun

Re: query with wild card with AND taking lot of time

2017-09-01 Thread Walter Underwood
params… wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 1, 2017, at 2:01 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > Shawn: > > See: https://issues.apache.org/jira/browse/SOLR-7219 > > Try fq=filter(

Re: Index relational database

2017-08-31 Thread Walter Underwood
CPU is available, etc. We have one query that extracts 9 million documents from MySQL in about 20 minutes. We have another query on a different MySQL database that takes 90 minutes to get 7 million documents. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog

Re: Index relational database

2017-08-30 Thread Walter Underwood
Think about making a denormalized view, with all the fields needed in one table. That view gets sent to Solr. Each row is a Solr document. It could be implemented as a view or as SQL, but that is a useful mental model for people starting from a relational background. wunder Walter Underwood

Re: Solr memory leak

2017-08-28 Thread Walter Underwood
That would be a really good reason for a 6.7. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 28, 2017, at 8:48 AM, Markus Jelsma <markus.jel...@openindex.io> wrote: > > It is, unfortunately, not co

Re: autoSoftCommit doesn't work as expected / documented

2017-08-24 Thread Walter Underwood
. Those are designed to handle that. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 24, 2017, at 1:05 PM, Angel Todorov <attodo...@gmail.com> wrote: > > Hello, > > So I can never have soft auto commit after each update ? Thi

Re: Solr 6.6.0 - High CPU during indexing

2017-08-18 Thread Walter Underwood
I see a server with 100Gb of memory and processes (java and jsvc) using 203Gb of virtual memory. Hmm. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 18, 2017, at 12:05 PM, Joe Obernberger <joseph.obernber...@gmail.com> > wrote: &g

Re: Get results in multiple orders (multiple boosts)

2017-08-18 Thread Walter Underwood
Why do you want to do this in Solr? This would be pretty easy in SQL. If you want to sort, use a relational database. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 18, 2017, at 2:52 AM, Luca Dall'Osto <tenacious...@yahoo.it.INVALID>

Stack Overflow in FuzzyLookupFactory

2017-08-16 Thread Walter Underwood
Just got a stack overflow in the Lucene automata code. Is there a way to save out the FSA for a bug report? This is in 6.5.1, so it may be related to https://issues.apache.org/jira/browse/SOLR-9458 <https://issues.apache.org/jira/browse/SOLR-9458> wunder Walter Underwood wun...@wunderwo

Re: Optimizing Dataimport from Oracle; cursor sharing; changing oracle session parameters

2017-08-15 Thread Walter Underwood
This might be a hack, but the CSV importer is really fast. Run the query in your favorite command line and export to CSV, then load it. You can even make batches. Maybe use ranges of the ID, then delete by query for that range. wunder Walter Underwood wun...@wunderwood.org http

Re: Solr 6 and IDF

2017-08-08 Thread Walter Underwood
of how common those skills are. And for tf, a document tagged with both “new york” and “new york city” is not twice as much about New York. Same for the movie “New York, New York”. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 8, 2017, at 2:18

Re: MongoDb vs Solr

2017-08-05 Thread Walter Underwood
durability, but Solr is generally not considered to be durable across crashes or “kill -9”. https://en.wikipedia.org/wiki/ACID Also, there is no explicit schema migration support. Schema changes usually require a full reload from the repository. wunder Walter Underwood wun...@wunderwood.org

Re: MongoDb vs Solr

2017-08-05 Thread Walter Underwood
”. To me, that means “not a database”. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 5, 2017, at 4:59 AM, Dave <hastings.recurs...@gmail.com> wrote: > > And to add to the conversation, 7 year old blog posts are not a reason to &

Re: MongoDb vs Solr

2017-08-04 Thread Walter Underwood
. Straightforward, but not easy to do it fast. The “Inside MarkLogic Server” paper does a good job of explaining the guts. Now, back to our regularly scheduled Solr presentations. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 4, 2017, at 8:13 PM, Da

Re: MongoDb vs Solr

2017-08-04 Thread Walter Underwood
Solr is NOT a database. If you need a database, don’t choose Solr. If you need both a database and search, choose MarkLogic. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 4, 2017, at 4:16 PM, Francesco Viscomi <fvisc...@gmail.com&

Metrics in 6.5.1 names and stuff

2017-08-03 Thread Walter Underwood
I’m trying to get what I want out of the metrics reporting in Solr. I want the counts and percentiles for each request handler in each collection. If I have “/srp”, “/suggest”, and “/seo”, I want three sets of metrics. I’m getting a lot of weird stuff. For counts for /srp in an eight node

Re: mixed index with commongrams

2017-08-03 Thread Walter Underwood
How long are your GC pauses? Those affect all queries, so they make the 99th percentile slow with queries that should be fast. The G1 collector has helped our 99th percentile. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 3, 2017, at 8:48

Re: Limiting the number of queries/updates to Solr

2017-08-02 Thread Walter Underwood
Nagle at Ford Aerospace. I recommend his note “On Packet Switches with Infinite Storage” (1985) for the full story. It is only eight pages long, but packed with goodness. https://tools.ietf.org/html/rfc970 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Move index directory to another partition

2017-08-01 Thread Walter Underwood
Way back in the 1.x days, replication was done with shell scripts and rsync, right? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 1, 2017, at 2:45 PM, Shawn Heisey <apa...@elyograg.org> wrote: > > On 7/31/2017 12:28 PM, Ma

Re: Optimize stalls at the same point

2017-07-25 Thread Walter Underwood
G1HeapRegionSize=8m \ -XX:MaxGCPauseMillis=200 \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ “ Last week, I benchmarked the 4.x config handling 15,000 requests/minute with a 95th percentile response time of 30 ms, using production logs. wunder Walter Underwood wun...@wunderwood.org http://observer

Re: Optimize stalls at the same point

2017-07-25 Thread Walter Underwood
Are you sure you need a 100GB heap? The stall could be a major GC. We run with an 8GB heap. We also run with Xmx equal to Xms, growing memory to the max was really time-consuming after startup. What version of Java? What GC options? wunder Walter Underwood wun...@wunderwood.org http

Re: Need guidance solrcloud shardings with date interval

2017-07-25 Thread Walter Underwood
the shards are created to keep load and disk usage distributed. If you want search to keep working after a failure, you will also need to create and delete additional shards as replicas. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 22, 2

Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Walter Underwood
to force RDBMS sharding onto Solr. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 20, 2017, at 8:09 AM, rehman kahloon > <mrehman_kahl...@yahoo.com.INVALID> wrote: > > blockquote, div.yahoo_quoted { margin-left: 0 !importan

Re: Getting IO Exception while Indexing

2017-07-20 Thread Walter Underwood
If Apache is returning 400, then it really is a bad request. Debug the request and fix it. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 19, 2017, at 11:27 PM, mesenthil1 > <senthilkumar.arumu...@viacomcontractor.com> wr

Re: Need guidance solrcloud shardings with date interval

2017-07-20 Thread Walter Underwood
Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 20, 2017, at 7:57 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > Use the "implicit" router (being renamed "manual". that takes the > value of a particular fie

Re: Getting IO Exception while Indexing

2017-07-19 Thread Walter Underwood
A 400 would not be a failure to connect. A 400 means that the client is sending a bad request. Look at the Solr logs. Most likely, the document is invalid. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 19, 2017, at 7:54 AM, Susheel Ku

Re: Returning unique values for suggestion

2017-07-19 Thread Walter Underwood
ht": 5285, "payload": "" }, { "term": "Chemistry", "weight": 4548, "payload": "" }, { "term": "Chemistry", "weight": 3002, "payload": "" }, { "term": "Intro

Re: Create too many zookeeper connections when recreate CloudSolrServer instance

2017-07-18 Thread Walter Underwood
robustness by being too clever with the client. Hacking the client is not a last choice, it is a bad choice. For queries, there is not much benefit in running the cloud-aware client. A regular load balancer works just about as well. We use the Amazon load balancers. wunder Walter Underwood wun

Re: Create too many zookeeper connections when recreate CloudSolrServer instance

2017-07-17 Thread Walter Underwood
If your Zookeeper cluster is rebooting frequently, you have much, much worse problems than client connections. Is Zookeeper unstable in your installation? If so, fix that. Stop hacking the client. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog

Re: HTTP ERROR 504 - Optimize

2017-07-13 Thread Walter Underwood
Optimize can take a long time. Why are you doing an optimize? It doesn’t really optimize the index, it only forces merges and deletions. Solr does that automatically in the background. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 13, 2

Re: How to "chain" import handlers: import from DB and from file system

2017-07-10 Thread Walter Underwood
times with three different parser packages on two engines. Never on Solr, though. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 10, 2017, at 12:40 PM, Allison, Timothy B. <talli...@mitre.org> wrote: > >> 4. Write a

Re: How to "chain" import handlers: import from DB and from file system

2017-07-09 Thread Walter Underwood
4. Write an external program that fetches the file, fetches the metadata, combines them, and send them to Solr. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 9, 2017, at 3:03 PM, Giovanni De Stefano <giova...@servisoft.be> wrote: &

Re: Max document per shard ( included deleted documents )

2017-07-07 Thread Walter Underwood
The deleted records will be automatically cleaned up in the background. You don’t have to do anything. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 7, 2017, at 1:25 PM, calamita.agost...@libero.it wrote: > > > Sorry , I kn

Re: Solr crashing / slowing down the performance

2017-07-03 Thread Walter Underwood
Solr machine we deploy, even in test and dev, has 15 GB of RAM, SSD disks, and has an 8 GB Java heap. In prod, we run with enough RAM that the entire index can live in RAM file buffers. We don’t do a lot of faceting or other memory-intensive queries. We mostly just search. wunder Walter

Re: Not highlighting "and" and "or"?

2017-06-30 Thread Walter Underwood
I would agree with removing the stopword filter from the example configs. It is not a “best practice” or even a recommended practice. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 29, 2017, at 8:01 PM, Rick Leir <rl...@leirtech.com&

Re: Not highlighting "and" and "or"?

2017-06-29 Thread Walter Underwood
My blog post has a list of movie titles. I forgot to list the TV series “Once and Again”. Some bands that are not searchable with stopwords: * The Who * Was (not Was) * The The wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 29, 2017, at 2

Re: Not highlighting "and" and "or"?

2017-06-29 Thread Walter Underwood
engines on 16-bit machines. Neither disks nor RAM were big enough to hold the posting lists for common words. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 29, 2017, at 1:46 PM, Rick Leir <rl...@leirtech.com> wrote: > >

Re: Not highlighting "and" and "or"?

2017-06-29 Thread Walter Underwood
Setting lowercaseOperators=false for the request handler defaults fixes this. Probably also fixes some relevance anomalies. Thanks! wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 29, 2017, at 6:38 AM, Shawn Heisey <apa...@elyograg.org&

Re: Not highlighting "and" and "or"?

2017-06-29 Thread Walter Underwood
Nope. Haven’t used stopwords for the last 20 years. I wonder if lowercaseOperators is true. The docs don’t give the default value for that in edismax. https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html wunder Walter Underwood wun...@wunderwood.org http

Re: Not highlighting "and" and "or"?

2017-06-28 Thread Walter Underwood
jectNames:once | (bookTitle_text:once)^4.0) +((concept_ai_concepts_names_default:again)^2.0 | (question:again)^2.0 | subjectNames:again | (bookTitle_text:again)^4.0)) ((bookTitle_text:\"once and again\")^8.0 | (question:\"once and again\")^4.0 | (concept_ai_concepts_names_defau

<    1   2   3   4   5   6   7   8   9   10   >