Re: CLUSTERSTATUS API and Error loading specified collection / config in Solr 5.3.2.

2018-03-12 Thread Shalin Shekhar Mangar
I think you are running into race conditions in the API which have been fixed. See SOLR-8804 and SOLR-10720. The first is available in 5.5.1 but the latter fix will be released in the upcoming 7.3 release. The best workaround for your version is to just retry a few times until the API succeeds.

Re: Resend: Authorization on 6.6.0

2018-03-12 Thread Shawn Heisey
On 3/12/2018 8:39 PM, Terry Steichen wrote: I'm increasingly of the view that Solr's authentication/authorization mechanism doesn't work correctly in a _standalone_ mode.  It was present in the cloud mode for quite a few versions back, but as of 6.0.0 (or so) it was supposed to be available in

Re: Why are cursor mark queries recommended over regular start, rows combination?

2018-03-12 Thread Shawn Heisey
On 3/12/2018 6:18 PM, S G wrote: > We have use-cases where some queries will return about 100k to 500k records. > As per https://lucene.apache.org/solr/guide/7_2/pagination-of-results.html, > it seems that using start=x, rows=y is a bad combination performance wise. > > 1) However, it is not clear

Re: Including a filtered-field in the default-field

2018-03-12 Thread Erick Erickson
bq: Looks like the "qf=all phonetic" would take the place of my existing "df=all" parameter. In fact, it may call int question whether you even want an "all" field or just list all the fields you _would_ have copied into "all" in the "qf" parameter. Having a single field to search is certainly

Re: Resend: Authorization on 6.6.0

2018-03-12 Thread Shawn Heisey
On 3/12/2018 4:07 PM, Terry Steichen wrote: > I'm using 6.6.0 with security.json active, having the content shown > below.  I am running standalone mode, have two solr cores defined: > email1, and email2.  Since the 'blockUnknown' is set to false, everyone > should have access to any unprotected

Re: Why are cursor mark queries recommended over regular start, rows combination?

2018-03-12 Thread Erick Erickson
<1> consider start=100=10. In the absence of cursorMark, Solr has to sort the top 110 documents in order to throw away the first 100 since the last document scored could be in the top 110 and there's no way to know that ahead of time. For 110 that's not very expensive, but when the list is in the

Why are cursor mark queries recommended over regular start, rows combination?

2018-03-12 Thread S G
Hi, We have use-cases where some queries will return about 100k to 500k records. As per https://lucene.apache.org/solr/guide/7_2/pagination-of-results.html, it seems that using start=x, rows=y is a bad combination performance wise. 1) However, it is not clear to me why the alternative:

Re: Solr search engine configuration

2018-03-12 Thread Shawn Heisey
On 3/12/2018 4:15 PM, PeterKerk wrote: > I trimmed stemdict_nl.txt for testing to just this: > > aachenaach > aachener aachener According to the example here:

RE: Solr search engine configuration

2018-03-12 Thread Markus Jelsma
Hello Peter, StemmerOverride wants \t separated fields, that is probably the cause of the AIooBE you get. Regarding schema definitions, each factory JavaDoc [1] has a proper example listed. I recommend putting a decompounder before a stemmer, and have an accent (or ICU) folder as one of the

Re: Including a filtered-field in the default-field

2018-03-12 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Erick, (Sorry... hit sent inadvertently before completion...) On 3/12/18 2:50 PM, Erick Erickson wrote: > Something like: > > solr/collection/query?q=chris shultz=edismax=all^10 > phonetic Interesting. Looks like the "qf=all phonetic" would

Re: Including a filtered-field in the default-field

2018-03-12 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Erick, On 3/12/18 2:50 PM, Erick Erickson wrote: > Something like: > > solr/collection/query?q=chris shultz=edismax=all^10 > phonetic Interesting. Looks like the "qf=all phonetic" would take the place of my existing "df=all" parameter. > The

RE: Solr search engine configuration

2018-03-12 Thread PeterKerk
@Erick: thank you for clarifying! @Markus: I feel like I'm not (or at least should not be :-)) the first person to run into these challenges. "You can solve this by adding manual rules to StemmerOverrideFilter, but due to the compound nature of words, you would need to add it for all the mills"

Resend: Authorization on 6.6.0

2018-03-12 Thread Terry Steichen
I'm resending the information below because the original message got the security.json stuff garbled. I'm using 6.6.0 with security.json active, having the content shown below.  I am running standalone mode, have two

Authorization in Solr 6.6.0 Not Working Properly

2018-03-12 Thread Terry Steichen
I'm using 6.6.0 with security.json active, having the content shown below.  I am running standalone mode, have two solr cores defined: email1, and email2.  Since the 'blockUnknown' is set to false, everyone should have access to any unprotected resource.  As you can see, I have three users

Re: CLUSTERSTATUS API and Error loading specified collection / config in Solr 5.3.2.

2018-03-12 Thread Hendrik Haddorp
Hi, are your collections using stateFormat 1 or 2? In version 1 all state was stored in one file while in version 2 each collection has its own state.json. I assume that in the old version it could happen that the common file still contains state for a collection that was deleted. So I would

Re: CDCR performance issues

2018-03-12 Thread Tom Peters
I'm also having issue with replicas in the target data center. It will go from recovering to down. And when one of my replicas go to down in the target data center, CDCR will no longer send updates from the source to the target. > On Mar 12, 2018, at 9:24 AM, Tom Peters

Re: Including a filtered-field in the default-field

2018-03-12 Thread Erick Erickson
Something like: solr/collection/query?q=chris shultz=edismax=all^10 phonetic The point of edismax is to take whatever the input is and distribute it among one or more fields defined by the "qf" parameter. In this case, it'll look for "chris" and "shultz" in both the "all" and "phonetic"

Re: Some performance questions....

2018-03-12 Thread BlackIce
I don't have any production logs and this all sounds to complicated. So, I'll just trow the system together in a way it makes the most sense for now.. collect some logs and then do some testing further down the road. For now just get the sucker up and running. Thanks all On Mon, Mar 12,

Re: Including a filtered-field in the default-field

2018-03-12 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Erick, On 3/12/18 1:36 PM, Erick Erickson wrote: > Did you try edismax? Err no, and I must admit that it's a lot to take in. Did you have a particular suggestion for how to use it? Thanks, - -chris > On Mon, Mar 12, 2018 at 10:20 AM,

Re: Some performance questions....

2018-03-12 Thread Deepak Goel
I am not sure if I understand your question *"How do I test this?"* You have to run test (benchmark test) of transactions (queries) which are most representative of your system (requirement). You can use a performance testing tool like JMeter (along with PerfMon configured for utilisation

SpellCheck Reload

2018-03-12 Thread Sadiki Latty
Greetings list, I had question regarding the spellcheck.reload parameter. I am using the IndexBasedSpellChecker which creates it's dictionary based on content from a field. I built the spell check (in error) with a field that has stemming and other filters associated to it. Regarding the

Re: Including a filtered-field in the default-field

2018-03-12 Thread Erick Erickson
Did you try edismax? On Mon, Mar 12, 2018 at 10:20 AM, Christopher Schultz wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > All, > > I have a Solr index containing application user information (username, > first/last, etc.). I have created an "all"

Re: Defining a phonetic analyzer and searcher via the schema API

2018-03-12 Thread Erick Erickson
Chris: LGTM, except maybe ;). You'll want to look closely at your admin UI/Analysis page for the field (or fieldType) once it's defined. Uncheck the "verbose" box when you look the first time, it'll be less confusing. That'll show you _exactly_ what the results are and whether they match

Re: Some performance questions....

2018-03-12 Thread BlackIce
So Im thinking following scenarios : Single instance with drives in raid 0, raid 10 and raid 5. And then having 3 Vms and 4 Solr instances each with its own HD. How do I test this? Greetz On Mar 12, 2018 1:16 PM, "BlackIce" wrote: > OK, so we're gone nowhere, since

Re: Some performance questions....

2018-03-12 Thread Walter Underwood
Benchmark with production logs. Replay them at a constant request rate. Measure the response time and look at the median and 90th or 95th percentile. Do not use the average response time, because that will be thrown off by outliers. It is best to run a few thousand warming queries before

Including a filtered-field in the default-field

2018-03-12 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I have a Solr index containing application user information (username, first/last, etc.). I have created an "all" field for the purpose of using it as a default. It contains most but not all fields. I recently added phonetic searching for the

Re: Some performance questions....

2018-03-12 Thread BlackIce
OK, so we're gone nowhere, since I've already lost lots of time... A few days more or less won't make a difference I'd be willing to benchmark if some tells me how to. Greetz On Mar 12, 2018 7:17 AM, "Deepak Goel" wrote: > Now you are mixing your original question

Re: Defining a phonetic analyzer and searcher via the schema API

2018-03-12 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Erick, On 3/12/18 1:00 PM, Erick Erickson wrote: > bq: which you aren't supposed to edit directly. > > Well, kind of. Here's why it's "discouraged": > https://lucene.apache.org/solr/guide/6_6/schema-api.html. > > But as long as you don't

Re: Defining a phonetic analyzer and searcher via the schema API

2018-03-12 Thread Walter Underwood
People can discourage that, but we only use hand-edited schema and solrconfig files. Those are checked into version control. I wrote some Python to load them into Zookeeper and reload the cluster. This allows us to use the same configs in dev, test, and prod. We can actually test things before

Re: Defining a phonetic analyzer and searcher via the schema API

2018-03-12 Thread Erick Erickson
bq: which you aren't supposed to edit directly. Well, kind of. Here's why it's "discouraged": https://lucene.apache.org/solr/guide/6_6/schema-api.html. But as long as you don't mix-and-match hand-editing with using the schema API you can hand edit it freely. You're then in charge of pushing it

Defining a phonetic analyzer and searcher via the schema API

2018-03-12 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I'd like to add a new synthesized field that uses a phonetic analyzer such as Beider-Morse. I'm using Solr 7.2. When I request the current schema via the schema API, I get a list of existing fields, dynamic fields, and analyzers, none of

Re: Solr search engine configuration

2018-03-12 Thread Erick Erickson
Peter: bq: I don't have a requestHandler named "/select". Right, that was just an example of a request handler, your "/scoresearch" handler _does_ have edismax as your default "defType" so assuming you're using that one it makes no difference at all whether you specify =edismax on the URL or

Re: Some performance questions....

2018-03-12 Thread Shawn Heisey
On 3/12/2018 3:22 AM, Deepak Goel wrote: A single OS and JVM does not scale linearly for higher loads. If you have seperate OS and Java, the load is distributed across multiple instances (with each instance only requiered to support a smaller load and hence would scale nicely) I had found this

Re: LTR Model size

2018-03-12 Thread Roopa Rao
What would be the best way to patch this to Solr 6.6 without having to do a full upgrade Thanks, Roopa On Fri, Mar 9, 2018 at 4:55 PM, Erick Erickson wrote: > Spoonerk: > > Please follow the instructions here: >

Re: CDCR performance issues

2018-03-12 Thread Tom Peters
Anyone have any thoughts on the questions I raised? I have another question related to CDCR: Sometimes we have to reindex a large chunk of our index (1M+ documents). What's the best way to handle this if the normal CDCR process won't be able to keep up? Manually trigger a bootstrap again? Or is

Re: The Impact of the Number of Collections on Indexing Performance in Solr 6.0

2018-03-12 Thread spoonerk
I have tried emailing to.unsubscribe. I have tried disrupting threads hoping to anger the admin into getting me out of the spam list. All I get is arrogant emails about headers On Mar 12, 2018 1:15 AM, "苗海泉" wrote: > Thanks Erick and Shawn , Thank you for your

Re: Some performance questions....

2018-03-12 Thread Deepak Goel
Now you are mixing your original question about performance with reliability On 12 Mar 2018 02:29, "BlackIce" wrote: > Second to this wouldn't 4 Solr instances each with its own HD be fault > tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to his comes > the

Re: Solr SynonymGraphFilterFactory error on import

2018-03-12 Thread damian.pawski
"/You probably want to call solr.FlattenGraphFilterFactory after the call to WordDelimiterGraphFilterFactory. I put it at the end/ " That solved my issue Thank you -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Altering the query if query contains all stopwods

2018-03-12 Thread Ryan Yacyshyn
Alright, thanks. Yeah, the SuggestStopFilterFactory gets closer but isn't what I'm looking for in this case! Ryan On Sat, 10 Mar 2018 at 06:12 Rick Leir wrote: > Tav, Ryan > Now you have me wondering, should it be returning *:* or some general > landing page. > > Suppose

Re: Some performance questions....

2018-03-12 Thread Deepak Goel
We need benchmarks or data to support the claim. A single OS and JVM does not scale linearly for higher loads. If you have seperate OS and Java, the load is distributed across multiple instances (with each instance only requiered to support a smaller load and hence would scale nicely) I had

RE: Solr search engine configuration

2018-03-12 Thread Markus Jelsma
Hi, Glad to hear you removed the gramming, but Kraaij-Pohlmann isn't going to solve all problems either, for example molens => molen, but molen => mool, and many more like that. You can solve this by adding manual rules to StemmerOverrideFilter, but due to the compound nature of words, you

Re: The Impact of the Number of Collections on Indexing Performance in Solr 6.0

2018-03-12 Thread 苗海泉
Thanks Erick and Shawn , Thank you for your patience. I said that the above phenomenon was caused by the IO, cpu, memory, and network io. The swap was turned off and the machine's memory was sufficient. When the speed of indexing is declining, QTime is found to take 3 seconds to 4 seconds to

Re: Some performance questions....

2018-03-12 Thread Shawn Heisey
On 3/11/2018 7:39 PM, Deepak Goel wrote: I doubt this. It would be great if someone can subtantiate this with hard facts This seems to be in response to my claim that virtualization always has overhead.  I don't see how this statement can be at all controversial. Virtualization isn't free,