[jira] [Updated] (SOLR-16712) Simplify PerReplicaStates (PRS) logic in DocCollection, replace PrsSupplier with actual PerReplicaStates param

Patson Luk (Jira) Mon, 20 Mar 2023 15:40:06 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-16712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Patson Luk updated SOLR-16712:
------------------------------
    Description: 
The current implementation of PRS requires an extra param to the DocCollection, 
the `PrsSupplier`, when `get` is called, would fetch the PRS states from ZK. 
The implementation of such supplier `LazyPrsSupplier` would only fetch the 
state on first call.

 

While this flow does work properly, this flow might introduce some unnecessary 
complexity:
 # PRS entry fetching from ZK is done either during or after the 
`DocCollection` construction, this could be a bit inconsistent with existing 
non PRS `DocCollection` design which `DocCollection` is simply a immutable 
container that does not fetch data after its instantiation
 # The lazy fetching could introduce some uncertainties as to when exactly the 
fetching happens (and if any Zookeeper IO exceptions arises)

 

My guess was that the lazy loading was introduced in 
https://issues.apache.org/jira/browse/SOLR-16580 as to avoid fetching the PRS 
states multiple times in the ctor of `DocCollection`, however, if we only fetch 
the `PerReplicaStates` once on update before calling the `DocCollection` ctor, 
and pass the `PerReplicaStates` object to the `DocCollection` instead, it can 
probably achieve similar result but with reduced uncertainty after 
`DocCollection` construction.

 

There's another branch which experimented with making DocCollection, Slice and 
Replica immutable as well for PRS enabled collection 
[https://github.com/cowpaths/fullstory-solr/pull/84] but is beyond the 
discussion of this Jira ticket

 

  was:
The current implementation of PRS requires an extra param to the DocCollection, 
the `PrsSupplier`, when `get` is called, would fetch the PRS states from ZK. 
The implementation of such supplier `LazyPrsSupplier` would only fetch the 
state on first call.

 

While this flow does work properly, this flow might introduce some unnecessary 
complexity:
 # PRS entry fetching from ZK is done either during or after the 
`DocCollection` construction, this could be a bit inconsistent with existing 
non PRS `DocCollection` design which `DocCollection` is simply a immutable 
container that does not fetch data after its instantiation
 # The lazy fetching could introduce some uncertainties as to when exactly the 
fetching happens (and if any Zookeeper IO exceptions arises)

 

My guess was that the lazy loading was introduced in 
https://issues.apache.org/jira/browse/SOLR-16580 as to avoid fetching the PRS 
states multiple times in the ctor of `DocCollection`, however, if we only fetch 
the `PerReplicaStates` once on update before calling the `DocCollection` ctor, 
and pass the `PerReplicaStates` object to the `DocCollection` instead, it can 
probably achieve similar result but with reduced uncertainty after 
`DocCollection` construction.

 

 


> Simplify PerReplicaStates (PRS) logic in DocCollection, replace PrsSupplier 
> with actual PerReplicaStates param
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-16712
>                 URL: https://issues.apache.org/jira/browse/SOLR-16712
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: main (10.0), 9.1.1
>            Reporter: Patson Luk
>            Priority: Major
>
> The current implementation of PRS requires an extra param to the 
> DocCollection, the `PrsSupplier`, when `get` is called, would fetch the PRS 
> states from ZK. The implementation of such supplier `LazyPrsSupplier` would 
> only fetch the state on first call.
>  
> While this flow does work properly, this flow might introduce some 
> unnecessary complexity:
>  # PRS entry fetching from ZK is done either during or after the 
> `DocCollection` construction, this could be a bit inconsistent with existing 
> non PRS `DocCollection` design which `DocCollection` is simply a immutable 
> container that does not fetch data after its instantiation
>  # The lazy fetching could introduce some uncertainties as to when exactly 
> the fetching happens (and if any Zookeeper IO exceptions arises)
>  
> My guess was that the lazy loading was introduced in 
> https://issues.apache.org/jira/browse/SOLR-16580 as to avoid fetching the PRS 
> states multiple times in the ctor of `DocCollection`, however, if we only 
> fetch the `PerReplicaStates` once on update before calling the 
> `DocCollection` ctor, and pass the `PerReplicaStates` object to the 
> `DocCollection` instead, it can probably achieve similar result but with 
> reduced uncertainty after `DocCollection` construction.
>  
> There's another branch which experimented with making DocCollection, Slice 
> and Replica immutable as well for PRS enabled collection 
> [https://github.com/cowpaths/fullstory-solr/pull/84] but is beyond the 
> discussion of this Jira ticket
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-16712) Simplify PerReplicaStates (PRS) logic in DocCollection, replace PrsSupplier with actual PerReplicaStates param

Reply via email to