[jira] [Commented] (SOLR-13439) Make collection properties easier and safer to use in code

JIRA Mon, 27 May 2019 16:30:21 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849246#comment-16849246
 ]


Tomás Fernández Löbbe commented on SOLR-13439:
----------------------------------------------

bq. what I think I see is a map of path (String) to HashSet<Watcher>, but the 
watchers added to that set are instances of ServerCnxn.java (which implements 
watcher) which appears to also be the object representing the connection to the 
client, of which (I think) there will only be one per client

So the server sends a single notification and the zk client is the one that 
multiplies for each zk watcher/callback?

bq. Once the watch fires, the anonymous wrapper should not be referenced by zk 
code and become held by a lambda submitted to the executor, which in turn is 
discarded once the call to process() finishes. Once the watch has fired and the 
cache has expired there are no references held and it becomes GC eligible.
But what if the watch doesn’t fire, that was what I meant y “and collection 
properties changes are infrequent”. 

bq. Zookeeper keeps a set of watches so adding the same watch object to it 
repeatedly is not a big deal at that level
But regardless of the wrapper, we are adding a new watch every time

{code:java}
new PropsWatcher(collection).refreshAndWatch(false);
{code}

bq. Each watch implies an invocation of onStateChanged by the thread spawned 
for each notification which is something of a waste, though that notification 
would be a trivial return false, that still gets evaluated by the for loop over 
all watches. 
Why? The {{PropsWatcher.process(WatchedEvent event)}} method would just return 
early, just as if the collection wasn’t watched. {{refreshAndWatch}} would not 
be called at all.

bq. From a usability and api design standpoint I really don't like setting a 
watch you plan to ignore just for the caching side effect of the watch's 
existence. It's just not straightforward. If you want to get notified when 
something happens, set a watch, if you want caching the api should be clear 
that that's what you're requesting (clarity improvements below). 

The API in my proposal is more explicit about the caching than the 
expiring-cache suggestion I think. Something like:

{code:java}
getCollectionProperties(final String collection, long cacheForMs) {
…
}
{code}
or even:

{code:java}
getCollectionPropertiesAndCache(final String collection, long cacheForMs) {
…
}

getCollectionProperties(final String collection) {
    return getCollectionPropertiesAndCache(collection, 0)
}
{code}

bq. It's unclear to me how I avoid adding a watch on every request if I want to 
check a collection property during a request. 
We’d have to keep the used {{PropsWatcher}} in some collection, we can’t just 
drop the instance as it’s being done now. They can be removed together with the 
removal of the zk watch and the cached properties (once the zk watch fires)

bq. I don't know a way of distinguishing my watch from a watch set by someone 
else (who might care about the notification via onStateChanged)
I don’t think we have the need to distinguish which one is who's, we just need 
to know which ones are only waiting for expiration 
({{!collectionPropsWatches.contains(collectionName)}}) and which ones are also 
in watched collections (thus, the expiration time doesn’t matter for them)

bq. Even if I simply trust in the side effects due to the existence of someone 
else's watch it then becomes possible
That’s not what I proposed. Can you elaborate on why do you think we are doing 
this in my proposal?

bq. The existing getCollectionProperties() method remain a simple single shot 
call that will pull from the cache or get from zk as it does now and not effect 
caching in any way.
+1

bq. Retain a set of PropsWatcher objects such that we can re-use them and thus 
avoid creating duplicate watches
I don’t love this idea, but I agree it’s unlikely to become a real issue in 
practice. Same thing would happen in my proposal if properties never change.

bq. My logic for expiration based caching be moved to a new method 
getAndCacheCollectionProperties()
I personally like my proposal better (:P), but I’m fine to go this route if we 
can’t agree (after you look at my latest comments, to make sure both strategies 
are fully understood by the both of us)



> Make collection properties easier and safer to use in code
> ----------------------------------------------------------
>
>                 Key: SOLR-13439
>                 URL: https://issues.apache.org/jira/browse/SOLR-13439
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: master (9.0)
>            Reporter: Gus Heck
>            Assignee: Gus Heck
>            Priority: Major
>         Attachments: SOLR-13439.patch, SOLR-13439.patch
>
>
> (breaking this out from SOLR-13420, please read there for further background)
> Before this patch the api is quite confusing (IMHO):
>  # any code that wanted to know what the properties for a collection are 
> could call zkStateReader.getCollectionProperties(collection) but this was a 
> dangerous and trappy API because that was a query to zookeeper every time. If 
> a naive user auto-completed that in their IDE without investigating, heavy 
> use of zookeeper would ensue.
>  # To "do it right" for any code that might get called on a per-doc or per 
> request basis one had to cause caching by registering a watcher. At which 
> point the getCollectionProperties(collection) magically becomes safe to use, 
> but the watcher pattern probably looks famillar induces a user who hasn't 
> read the solr code closely to create their own cache and update it when their 
> watcher is notified. If the caching side effect of watches isn't understood 
> this will lead to many in-memory copies of collection properties maintained 
> in user code.
>  # This also creates a task to be scheduled on a thread (PropsNotification) 
> and induces an extra thread-scheduling lag before the changes can be observed 
> by user code.
>  # The code that cares about collection properties needs to have a lifecycle 
> tied to either a collection or something other object with an even more 
> ephemeral life cycle such as an URP. The user now also has to remember to 
> ensure the watch is unregistered, or there is a leak.
> After this patch
>  # Calls to getCollectionProperties(collection) are always safe to use in any 
> code anywhere. Caching and cleanup are automatic.
>  # Code that really actually wants to know if a collection property changes 
> so it can wake up and do something (autoscaling?) still has the option of 
> registering a watcher that will asynchronously send them a notification.
>  # Updates can be observed sooner via getCollectionProperties with no need to 
> wait for a thread to run. (vs a cache held in user code)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13439) Make collection properties easier and safer to use in code

Reply via email to