Re: Issue with CDCR bootstrapping in Solr 7.1

2017-11-30 Thread Amrit Sarkar
Tom, (and take care not to restart the leader node otherwise it will replicate > from one of the replicas which is missing the index). How is this possible? Ok I will look more into it. Appreciate if someone else also chimes in if they have similar issue. Amrit Sarkar Search Engineer

Re: Do i need to reindex after changing similarity setting

2017-11-30 Thread Nawab Zada Asad Iqbal
This JIRA also throws some light. There is a discussion of encoding norm during indexing. The contributor eventually comments that "norms" encoded by different similarity are compatible to each other. On Thu, Nov 30, 2017 at 5:12 PM, Nawab Zada Asad Iqbal wrote: > Hi Walter, >

Re: Do i need to reindex after changing similarity setting

2017-11-30 Thread Nawab Zada Asad Iqbal
Hi Walter, I read the following line in reference docs, what does it mean by as long as the global similarity allows it: " A field type may optionally specify a that will be used when scoring documents that refer to fields with this type, as long as the "global" similarity for the collection

Re: Issue with CDCR bootstrapping in Solr 7.1

2017-11-30 Thread Tom Peters
Hi Amrit, I tried issuing hard commits to the various nodes in the target cluster and it does not appear to cause the follower replicas to receive the initial index. The only way I can get the replicas to see the original index is by restarting those nodes (and take care not to restart the

Fwd: solr-security-proxy

2017-11-30 Thread Rick Leir
Hi all I have just been looking at solr-security-proxy, which seems to be a great little app to put in front of Solr (link below). But would it make more sense to use a whitelist of Solr parameters instead of a blacklist? Thanks Rick https://github.com/dergachev/solr-security-proxy

Re: Compile problems with anonymous SimpleCollector in custom request handler

2017-11-30 Thread Tod Olson
Shawn, Thanks for the response! Yes, that was it, an older version unexpectedly in the classpath. And for the benefit of anyone who searches the list archive with a similar debugging need, it's pretty easy to print out the classpath from ant's build.xml:

Re: Issue with CDCR bootstrapping in Solr 7.1

2017-11-30 Thread Amrit Sarkar
Tom, This is very useful: > I found a way to get the follower replicas to receive the documents from > the leader in the target data center, I have to restart the solr instance > running on that server. Not sure if this information helps at all. You have to issue hardcommit on target after the

Re: Skewed IDF in multi lingual index, again

2017-11-30 Thread Walter Underwood
Expanding the query to use both the tagged and untagged term might work. I’m not sure the effect would be a lot different than boosting the preferred language. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 30, 2017, at 8:35 AM, Markus Jelsma

Re: Issue with CDCR bootstrapping in Solr 7.1

2017-11-30 Thread Tom Peters
Hi Amrit, Starting with more documents doesn't appear to have made a difference. This time I tried with >1000 docs. Here are the steps I took: 1. Deleted the collection on both the source and target DCs. 2. Recreated the collections. 3. Indexed >1000 documents on source data center, hard

RE: Skewed IDF in multi lingual index, again

2017-11-30 Thread Markus Jelsma
This is unfortunately not what we want. Some customers use filters to restrict language, but some customers don't. They want to be able to find documents in all languages, so we use user preference to get their local language on top. Except for very relevant documents in foreign languages,

Re: Skewed IDF in multi lingual index, again

2017-11-30 Thread Walter Underwood
I’ve occasionally considered using Unicode language tags (U+E001 and friends) on each term. That would make a term specific to a language, so we would get [en]LaserJet, [fr]LaserJet, [de]LaserJet, and so on. But that is a pretty big hammer, because it restricts matches to the same language. If

Re: Issue with CDCR bootstrapping in Solr 7.1

2017-11-30 Thread Amrit Sarkar
Hi Tom, I see what you are saying and I too think this is a bug, but I will confirm once on the code. Bootstrapping should happen on all the nodes of the target. Meanwhile can you index more than 100 documents in the source and do the exact same experiment again. Followers will not copy the

Skewed IDF in multi lingual index, again

2017-11-30 Thread Markus Jelsma
Hello, We already discussed this problem five years ago [1]. In short: documents in foreign languages are scored higher for some terms. It was solved back then by using docCount instead of maxDoc when calculating idf, it worked really well! But, probably due to index changes, the problem is

Re: does the payload_check query parser have support for simple query parser operators?

2017-11-30 Thread John Anonymous
Ok, thanks. Do you know if there are any plans to support special syntax in the future? On Thu, Nov 30, 2017 at 5:04 AM, Erik Hatcher wrote: > No it doesn’t. The payload parsers currently just simple tokenize with > no special syntax supported. > > Erik > > > On Nov

RE: [EXTERNAL] - Re: Basic SolrCloud help

2017-11-30 Thread Steve Pruitt
Thanks Shawn, it all mainly made sense. I took the hint and looked at both solr.in.cmd and solr.in.sh. Clearly setting ZK_HOST is a first step. I am sure this is explained somewhere, but I overlooked it. From here, once I have Solr installed, I can run the Control Script to upload a config

Issue with CDCR bootstrapping in Solr 7.1

2017-11-30 Thread Tom Peters
I'm running into an issue with the initial CDCR bootstrapping of an existing index. In short, after turning on CDCR only the leader replica in the target data center will have the documents replicated and it will not exist in any of the follower replicas in the target data center. All

RE: Solr Wildcard Search

2017-11-30 Thread Allison, Timothy B.
A slightly more refined answer... In my experience with the systems I've worked with, Porter and other stemmers can be useful as a "fallback field" with a really low boost, but you should be really careful if you're only searching on one field. Cannot recommend Doug Turnbull and John

RE: Solr Wildcard Search

2017-11-30 Thread Allison, Timothy B.
At the very least the English possessive filter, which you have. Great! Depending on what your query log analysis finds -- perhaps users are pretty much only searching on nouns? -- you might consider EnglishMinimalStemFilterFactory. I wouldn't say that porter was or wasn't chosen

Re: check softCommit , autocommit and hard commit count

2017-11-30 Thread Shawn Heisey
On 11/30/2017 4:36 AM, Puppy Linux Distros wrote: I am trying to calculate the total number of softCommit , autocommit and hard commit from the solr logs. Can you please check whether the below commands are correct ? Let me know how to find the total softcommit, hardcommit and autocommit from

RE: Solr Wildcard Search

2017-11-30 Thread Georgy Nevsky
I understand stemming reason. Thank you. What do you suggest to use for stemming instead of "Porter" ? I guess, it wasn't chosen intentionally. In the best we trust Georgy Nevsky -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, November 30, 2017

RE: Solr Wildcard Search

2017-11-30 Thread Allison, Timothy B.
The initial question wasn't about a phrasal search, but I largely agree that diff q parsers handle the analysis chain differently for multiterms. Yes, Porter is crazily aggressive. USE WITH CAUTION! As has been pointed out, use the Solr admin window and the "debug" in the query option to see

Re: Solr Wildcard Search

2017-11-30 Thread Atita Arora
As Rick raised the most important aspect here , that the phrase is broken into multiple terms ORed together , I believe if the use case requires to perform wildcard search on phrases , we would need to store the entire phrase as a single term in the index which probably is not happening right now

RE: Solr Wildcard Search

2017-11-30 Thread Georgy Nevsky
I wish to understand if I can do something to get in result term "shipping" when search for "shipp*"? Here field definition: Anything else can be important? Most configuration parameters are default to Apache Solr 7.1.0. In

Re: Solr Wildcard Search

2017-11-30 Thread Rick Leir
George, When you get those results it could be due to stemming. Wildcard processing expands your term to multiple terms, OR'd together. It also takes you down a different analysis pathway, as many analysis components do not work with multiple terms. Look into the SolrAdmin console, and use the

spellcheck.q issue

2017-11-30 Thread Georgy Nevsky
I have issue with spellcheck.q parameter. Thinking it is bug. If I’m doing search without specifying spellcheck.q parameter then I’m getting spellcheck suggestions. Query: /select?q=text_en-us:baring=en-us=on Result: 1 11 17 bearing

Solr Wildcard Search

2017-11-30 Thread Georgy Nevsky
Can somebody help me understand how Solr Wildcard Search is working? If I’m doing search for “ship*” term I’m getting in result many strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”, etc. But if I’m searching for “shipp*” I don’t get any result. In the best we trust Georgy

Dedupe documents inside of each group

2017-11-30 Thread Diego Ceccarelli (BLOOMBERG/ QUEEN VIC)
Hello, I have a use case where I need to dedupe documents in each group based on a particular field: example: doc1 = { field_a=1 field_b=2 } doc2 = { field_a=1 field_b=2 } doc3 = { field_a=1 field_b=3 } doc4 = { field_a=2 field_b=3 } doc5 = { field_a=2 field_b=3 } and I want to run "Group

check softCommit , autocommit and hard commit count

2017-11-30 Thread Puppy Linux Distros
Hi, I am trying to calculate the total number of softCommit , autocommit and hard commit from the solr logs. Can you please check whether the below commands are correct ? Let me know how to find the total softcommit, hardcommit and autocommit from the logs. *1. totalcommit=`cat $solrlogfile |

Re: does the payload_check query parser have support for simple query parser operators?

2017-11-30 Thread Erik Hatcher
No it doesn’t. The payload parsers currently just simple tokenize with no special syntax supported. Erik > On Nov 30, 2017, at 02:41, John Anonymous wrote: > > I would like to use wildcards and fuzzy search with the payload_check query > parser. Are these supported? >

Re: Huge Query execution time for multiple ORs

2017-11-30 Thread Faraz Fallahi
Uff... I See.. thx dir the explanation :) Am 30.11.2017 3:13 nachm. schrieb "Emir Arnautović" < emir.arnauto...@sematext.com>: > Hi Faraz, > It is a bit worse than that - it also needs to calculate score, so for > each matching doc of one query part it has to check if it appears in > results of

Re: Huge Query execution time for multiple ORs

2017-11-30 Thread Emir Arnautović
Hi Faraz, It is a bit worse than that - it also needs to calculate score, so for each matching doc of one query part it has to check if it appears in results of other query parts. If you use term query parser, you avoid calculating score - all doc will have score 1. Solr is based on lucene,