Re: Json Faceting Performance Issues on solr v8.7.0

2021-02-05 Thread Michael Gibney
Ah! that's significant. The latency is likely due to building the
OrdinalMap (which maps segment ords to global ords) ... "dvhash" (assuming
the relevant fields are not multivalued) will very likely work; "dvhash"
doesn't map to global ords, so doesn't need to build the OrdinalMap (which
gets built the first time it's needed per-field per-searcher).

If "dvhash" doesn't work for some reason (multivalued fields, needs to work
over broader domains, etc.?) you could probably achieve a decent result by
configuring a static warming query (newSearcher) to issue a request that
facets on the relevant fields. That will delay the opening of each new
searcher, but will ensure that user requests don't block.

SOLR-15008 _was_ actually pretty similar, with the added wrinkle of
involving distributed (multi-shard) requests (and iirc "dvhash" wouldn't
have worked in that case?)

On Fri, Feb 5, 2021 at 8:00 PM mmb1234  wrote:

> > Does this happen on a warm searcher (are subsequent requests with no
> intervening updates _ever_ fast?)?
>
> Subsequent response times very fast if searcher remains open. As a control
> test, I faceted on the same field that I used in the q param.
>
> 1. Start solr
>
> 2. Execute q=resultId:x=0
> =>  500ms
>
> 3. Execute q=resultId:x=0
> => 40,000ms
>
> 4. Execute q=resultId:x=0
> =>  150ms
>
> 5. Execute q=processId:x=0
> =>   2,500ms
>
> 6. Execute q=processId:x=0
> => 200ms
>
>
> curl
> '
> http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=processId:-xxx-xxx-xxx-x=0
> '
> -d '
> json.facet={
> categories:{
>   "type": "terms",
>   "field" : "processId",
>   "limit" : 1
> }
> }
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Json Faceting Performance Issues on solr v8.7.0

2021-02-05 Thread mmb1234
> Does this happen on a warm searcher (are subsequent requests with no
intervening updates _ever_ fast?)?

Subsequent response times very fast if searcher remains open. As a control
test, I faceted on the same field that I used in the q param.

1. Start solr

2. Execute q=resultId:x=0
=>  500ms

3. Execute q=resultId:x=0
=> 40,000ms

4. Execute q=resultId:x=0
=>  150ms

5. Execute q=processId:x=0
=>   2,500ms

6. Execute q=processId:x=0
=> 200ms


curl
'http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=processId:-xxx-xxx-xxx-x=0'
-d '
json.facet={
categories:{
  "type": "terms",
  "field" : "processId",
  "limit" : 1
}
}



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Json Faceting Performance Issues on solr v8.7.0

2021-02-05 Thread Michael Gibney
Apologies, I missed deducing from the request url that you're already
talking strictly about single-shard requests (so everything I was
suggesting about shards.preference etc. is not applicable). "dvhash" is
still worth a try though, esp. with `numFound` being 943 (out of 185
million!). Does this happen on a warm searcher (are subsequent requests
with no intervening updates _ever_ fast?)?

On Fri, Feb 5, 2021 at 6:13 PM mmb1234  wrote:

> Ok. I'll try that. Meanwhile query on resultId is subsecond response. But
> the
> immediate next query for faceting takes 40+secs. The core has 185million
> docs and 63GB index size.
>
> curl
> '
> http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=resultId:x=0
> '
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":558,
> "params":{
>   "q":"resultId:x",
>   "cache":"false",
>   "rows":"0"}},
>   "response":{"numFound":943,"start":0,"numFoundExact":true,"docs":[]
>   }}
>
>
> curl
> '
> http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=resultId:x=0
> '
> -d '
> json.facet={
> categories:{
>   "type": "terms",
>   "field" : "resultId",
>   "limit" : 1
> }
> }'
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":43834,
> "params":{
>   "q":"resultId:x",
>   "json.facet":"{\ncategories:{\n  \"type\": \"terms\",\n
> \"field\" : \"resultId\",\n  \"limit\" : 1\n}\n}",
>   "cache":"false",
>   "rows":"0"}},
>   "response":{"numFound":943,"start":0,"numFoundExact":true,"docs":[]
>   },
>   "facets":{
> "count":943,
> "categories":{
>   "buckets":[{
>   "val":"x",
>   "count":943}]}}}
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Json Faceting Performance Issues on solr v8.7.0

2021-02-05 Thread mmb1234
Ok. I'll try that. Meanwhile query on resultId is subsecond response. But the
immediate next query for faceting takes 40+secs. The core has 185million
docs and 63GB index size.

curl
'http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=resultId:x=0'
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":558,
"params":{
  "q":"resultId:x",
  "cache":"false",
  "rows":"0"}},
  "response":{"numFound":943,"start":0,"numFoundExact":true,"docs":[]
  }}


curl
'http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=resultId:x=0'
-d '
json.facet={
categories:{
  "type": "terms",
  "field" : "resultId",
  "limit" : 1
}
}'
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":43834,
"params":{
  "q":"resultId:x",
  "json.facet":"{\ncategories:{\n  \"type\": \"terms\",\n 
\"field\" : \"resultId\",\n  \"limit\" : 1\n}\n}",
  "cache":"false",
  "rows":"0"}},
  "response":{"numFound":943,"start":0,"numFoundExact":true,"docs":[]
  },
  "facets":{
"count":943,
"categories":{
  "buckets":[{
  "val":"x",
  "count":943}]}}}



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Json Faceting Performance Issues on solr v8.7.0

2021-02-05 Thread Michael Gibney
`resultId` sounds like it might be a relatively high-cardinality field
(lots of unique values)? What's your number of shards, and replicas per
shard? SOLR-15008 (note: not a bug) describes a situation that may be
fundamentally similar to yours (though to be sure it's impossible to say
for sure without more information):
https://issues.apache.org/jira/browse/SOLR-15008?focusedCommentId=17236213#comment-17236213

In particular, the explanation and troubleshooting advice on the linked
comment might be relevant?

"dvhash" is _not_ mentioned on that SOLR-15008, but if the `processId` main
query significantly reduces the domain -- or more specifically, if
`resultId` is high-cardinality overall, but the cardinality of `resultId`
values _associated with a particular query_ is low -- you might consider
trying `"method"="dvhash"` (which should bypass OrdinalMap creation and
array allocation, if either/both of those contribute to the latency you're
finding).

Michael

On Fri, Feb 5, 2021 at 4:42 PM mmb1234  wrote:

> Hello,
>
> I am seeing very slow response from json faceting against a single core
> (though core is shard leader in a collection).
>
> Fields processId and resultId are non-multivalued, indexed and docvalues
> string (not text).
>
> Soft Commit = 5sec (opensearcher=true) and Hard Commit = 10sec because new
> docs are constantly being indexed with 95% new and 5% overwritten
> (overwrite=true; no atomic update). Caches are not considered useful due to
> commit frequency.
>
> Solr is v8.7.0 on openjdk11.
>
> Is there any way to improve json facet QTime?
>
> ## query only
> curl
> '
> http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=processId:-xxx-xxx-xxx-x=0
> '
> -d '
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":552,
> "params":{
>   "q":"processId:-xxx-xxx-xxx-x",
>   "cache":"false",
>   "rows":"0"}},
>   "response":{"numFound":231311,"start":0,"numFoundExact":true,"docs":[]
>   }}
>
> ## json facet takes 46secs
> curl
> '
> http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=processId:-xxx-xxx-xxx-x=0
> '
> -d '
> json.facet={
> categories:{
>   "type": "terms",
>   "field" : "resultId",
>   "limit" : 1
> }
> }'
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":46972,
> "params":{
>   "q":"processId:-xxx-xxx-xxx-x",
>   "json.facet":"{categories:{  \"type\": \"terms\",
> \"field\" : \"resultId\",  \"limit\" : 1}}",
>   "rows":"0"}},
>   "response":{"numFound":231311,"start":0,"numFoundExact":true,"docs":[]
>   },
>   "facets":{
> "count":231311,
> "categories":{
>   "buckets":[{
>   "val":"x",
>   "count":943}]}}}
>
>
> ## visualvm CPU sampling almost all time spent in lucene:
>
> org.apache.lucene.util.PriorityQueue.downHeap() 23,009 ms
>
> org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict.next()
> 13,268 ms
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Json Faceting Performance Issues on solr v8.7.0

2021-02-05 Thread mmb1234
Hello,

I am seeing very slow response from json faceting against a single core
(though core is shard leader in a collection). 

Fields processId and resultId are non-multivalued, indexed and docvalues
string (not text).

Soft Commit = 5sec (opensearcher=true) and Hard Commit = 10sec because new
docs are constantly being indexed with 95% new and 5% overwritten
(overwrite=true; no atomic update). Caches are not considered useful due to
commit frequency.

Solr is v8.7.0 on openjdk11.

Is there any way to improve json facet QTime?

## query only
curl
'http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=processId:-xxx-xxx-xxx-x=0'
-d '
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":552,
"params":{
  "q":"processId:-xxx-xxx-xxx-x",
  "cache":"false",
  "rows":"0"}},
  "response":{"numFound":231311,"start":0,"numFoundExact":true,"docs":[]
  }}

## json facet takes 46secs
curl
'http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=processId:-xxx-xxx-xxx-x=0'
-d '
json.facet={
categories:{
  "type": "terms",
  "field" : "resultId",
  "limit" : 1
}
}'
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":46972,
"params":{
  "q":"processId:-xxx-xxx-xxx-x",
  "json.facet":"{categories:{  \"type\": \"terms\", 
\"field\" : \"resultId\",  \"limit\" : 1}}",
  "rows":"0"}},
  "response":{"numFound":231311,"start":0,"numFoundExact":true,"docs":[]
  },
  "facets":{
"count":231311,
"categories":{
  "buckets":[{
  "val":"x",
  "count":943}]}}}


## visualvm CPU sampling almost all time spent in lucene:

org.apache.lucene.util.PriorityQueue.downHeap() 23,009 ms
org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict.next()
13,268 ms



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html