Different Edismax Behavior with user params vs Solr config params on Solr 8.

2021-01-19 Thread Kerwin
Hi,
I am upgrading from Solr 6.5.1 to solr 8.6.1 and have noticed a change in
the Edismax parser behavior which is affecting our search results. If user
operators are present in the search query, the Solr 6 behavior was to take
mm parameters from the user query string which was 0% by default if not
present. However in Solr 8, it is taking mm parameter specified in the Solr
request handler config which is 100%, see config below.

Hence the user query "samsung OR nokia" works in Solr 6 and is not working
in Solr 8 anymore when documents do not have both strings. This is
affecting our search results. Could you suggest why there is this
difference and how to resolve this?

A simple example config is as follows:


   
 
  {!edismax
  qf=manu
  mm=100% v=$qq}
  



The user query is:
http://localhost:8983/solr/search?qq=spring OR boot.

Appreciate any pointers which could resolve this. One solution is to
create two search handlers and check if there are parameters in the user
query before-hand, but I'd prefer this as a last resort.


Re: Effects of shards and replicas on performance

2021-01-19 Thread Shawn Heisey

On 1/19/2021 4:19 PM, ufuk yılmaz wrote:

Lets say I had only 1 replica for each collection but I split it to 6 shards, 1 
for every node.
Or I had 2 shards (1 shard is too big for a single node I think) but I had 3 
replicas, 3x2=6, 1 on every node.

How would it affect the performance?


It all depends on how many queries you're expecting to occur at the same 
time -- your query rate.


More replicas will generally make your system capable of handling a 
higher query load than fewer replicas, as long as the replicas are 
running on different physical hardware.


With a low query load, more shards CAN make things faster because it 
throws more system capacity at the problem -- assuming the different 
shards are on different physical hardware.  But as the number of queries 
increases, the systems get busier, and that advantage disappears.


Don't assign your heap size as a ratio of total memory size.  Your heap 
should be as big as it needs to be, and no bigger, leaving as much 
memory as possible for disk caching.  I can't say for sure, but with 20 
indexes the size you're talking about, 50 GB of memory per node is 
probably nowhere near enough.


Thanks,
Shawn


Effects of shards and replicas on performance

2021-01-19 Thread ufuk yılmaz
I’m trying to learn all I can on Solr for a year now and  I still scratch my 
head when it comes to effects of shards and replicas on performance.

- info about my setup 
We have a SolrCloud setup with 6 nodes.
Each collection has 2 shards and 2 replicas. 1 shard’s size is about 100GB.
Each collection has around  400M documents.
We have ~20 collections like this and they are increasing in number.
Each node has 50GB memory, 24GB of it given to Solr heap.

We do a lot of faceting and streaming expressions.
--


Replicas: I don’t need too much fault tolerance, if a node goes down my queries 
can stop, so a minimum of 2 replicas is enough for me if adding more replicas 
won’t help with performance. Is more replicas just a waste of disk space then? 
On the other hand, if more nodes had replicas of the same collection, could 
they execute those queries so the workload would be split over more nodes?


Shards: If a shard is huge for a single node, splitting it helps, but when it’s 
small, splitting it just causes more distributed work to be done right? So 2 
shards may be a sweet spot for me, or would I get better performance if I had 
smaller more numerous shards?


Lets say I had only 1 replica for each collection but I split it to 6 shards, 1 
for every node.
Or I had 2 shards (1 shard is too big for a single node I think) but I had 3 
replicas, 3x2=6, 1 on every node.

How would it affect the performance?

Also we do a lot of multi-collection search (solr/col1,col2,col3.../select) , 
so 1 query sometimes goes to 10 different collections. In that case even if the 
result set is very small, it takes a lot of time for the query to complete.

Many thanks if you read until here!

--uyilmaz


Re: Exact matching without using new fields

2021-01-19 Thread gnandre
Thanks for replying, Dave.

I am afraid that I am looking for non-index time i.e. query time solution.

Actually in my case I am expecting both documents to be returned from your
example. I am just trying to avoid returning of documents which contain a
tokenized versions
of the provided search query when it is enclosed within double quotes to
indicate exact matching expectation.

e.g.
search query -> "information retrieval"

This should match documents like following:
doc 1: "information retrieval"
doc 2: "Advanced information retrieval with Solr"

but should NOT match documents like
doc 3: "informed retrieval"
doc 4: "information extraction"  (considering 'extraction' was a specified
synonym of 'retrieval' )
doc 5: "INFORMATION RETRIEVAL"

etc

I am also ok with these documents showing up as long as they show up at
bottom. Also, query time solution is a must.

On Tue, Jan 19, 2021 at 12:22 PM David R  wrote:

> We had the same requirement. Just to echo back your requirements, I
> understand your case to be this. Given these 2 doc titles:
>
> doc 1: "information retrieval"
> doc 2: "Advanced information retrieval with Solr"
>
> You want a phrase search for "information retrieval" to find both
> documents, but an EXACT phrase search for "information retrieval" to find
> doc #1 only.
>
> If that's true, and case-sensitive search isn't a requirement, I indexed
> this in the token stream, with adjacent positions of course.
>
> START information retrieval END
> START advanced information retrieval with solr END
>
> And with our custom query parser, when an EXACT operator is found, I
> tokenize the query to match the first case. Otherwise pass it through.
>
> Needs custom analyzers on the query and index sides to generate the
> correct token sequences.
>
> It's worked out well for our case.
>
> Dave
>
>
>
> 
> From: gnandre 
> Sent: Tuesday, January 19, 2021 4:07 PM
> To: solr-user@lucene.apache.org 
> Subject: Exact matching without using new fields
>
> Hi,
>
> I am aware that to do exact matching (only whatever is provided inside
> double quotes should be matched) in Solr, we can copy existing fields with
> the help of copyFields into new fields that have very minimal tokenization
> or no tokenization (e.g. using KeywordTokenizer or using string field type)
>
> However this solution is expensive in terms of index size because it might
> almost double the size of the existing index.
>
> Is there any inexpensive way of achieving exact matches from the query
> side. e.g. boost the original tokens more at query time compared to their
> tokens?
>


Re: Leading wildcard searches very slow

2021-01-19 Thread matthew sporleder
https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
?

On Tue, Jan 19, 2021 at 4:01 AM mosheB  wrote:
>
> Hi, is there any sophisticated way [using the schema] to block brutal regex
> queries?
>
>
> Thanks
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Exact matching without using new fields

2021-01-19 Thread David R
We had the same requirement. Just to echo back your requirements, I understand 
your case to be this. Given these 2 doc titles:

doc 1: "information retrieval"
doc 2: "Advanced information retrieval with Solr"

You want a phrase search for "information retrieval" to find both documents, 
but an EXACT phrase search for "information retrieval" to find doc #1 only.

If that's true, and case-sensitive search isn't a requirement, I indexed this 
in the token stream, with adjacent positions of course.

START information retrieval END
START advanced information retrieval with solr END

And with our custom query parser, when an EXACT operator is found, I tokenize 
the query to match the first case. Otherwise pass it through.

Needs custom analyzers on the query and index sides to generate the correct 
token sequences.

It's worked out well for our case.

Dave




From: gnandre 
Sent: Tuesday, January 19, 2021 4:07 PM
To: solr-user@lucene.apache.org 
Subject: Exact matching without using new fields

Hi,

I am aware that to do exact matching (only whatever is provided inside
double quotes should be matched) in Solr, we can copy existing fields with
the help of copyFields into new fields that have very minimal tokenization
or no tokenization (e.g. using KeywordTokenizer or using string field type)

However this solution is expensive in terms of index size because it might
almost double the size of the existing index.

Is there any inexpensive way of achieving exact matches from the query
side. e.g. boost the original tokens more at query time compared to their
tokens?


Exact matching without using new fields

2021-01-19 Thread gnandre
Hi,

I am aware that to do exact matching (only whatever is provided inside
double quotes should be matched) in Solr, we can copy existing fields with
the help of copyFields into new fields that have very minimal tokenization
or no tokenization (e.g. using KeywordTokenizer or using string field type)

However this solution is expensive in terms of index size because it might
almost double the size of the existing index.

Is there any inexpensive way of achieving exact matches from the query
side. e.g. boost the original tokens more at query time compared to their
tokens?


Re: Solr Slack Workspace

2021-01-19 Thread Charlie Hull

Relevance Slack is open to anyone working on search & relevance - #solr is only 
one of the channels, there's lots more! Hope to see you there.

Cheers

Charlie
https://opensourceconnections.com/slack


On 16/01/2021 02:18, matthew sporleder wrote:

IRC has kind of died off,
https://lucene.apache.org/solr/community.html has a slack mentioned,
I'm on https://opensourceconnections.com/slack after taking their solr
training class and assume it's mostly open to solr community.

On Fri, Jan 15, 2021 at 8:10 PM Justin Sweeney
 wrote:

Hi all,

I did some googling and didn't find anything, but is there a Slack
workspace for Solr? I think this could be useful to expand interaction
within the community of Solr users and connect people solving similar
problems.

I'd be happy to get this setup if it does not exist already.

Justin



--
Charlie Hull - Managing Consultant at OpenSource Connections Limited 

Founding member of The Search Network  
and co-author of Searching the Enterprise 


tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828


Re: Leading wildcard searches very slow

2021-01-19 Thread mosheB
Hi, is there any sophisticated way [using the schema] to block brutal regex
queries?


Thanks



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html