[ 
https://issues.apache.org/jira/browse/SOLR-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-10515:
-----------------------------------------
    Attachment: SOLR-10515.patch

Thanks Andrzej. There were a few conflicts in the tests because of my last 
commit on SOLR-10738 so I fixed those in this patch. It should apply cleanly on 
the autoscaling branch.

A few comments:
# You need to add SuppressForbidden annotation on IdUtils, NodeAddedTrigger and 
NodeLostTrigger and their tests due to the use of System.currentTimeMillis 
otherwise precommit will complain
# Why TreeSet and TreeMap instead of HashSet and HashMap in the triggers? I 
tried to find where we rely on the order anywhere but couldn't.
# While replaying, ScheduledTrigger should check for hasPendingActions first 
otherwise it will discard the event for no reason
# TriggerEvent.getEventTime() now returns System.currentTimeMillis but the 
NodeAddedTriggerTest (and NodeLostTriggerTest) still compare with 
System.nanoTime
# OverseerTriggerThread uses while (!isClosed) but this usage of isClosed is 
without obtaining the update lock so the right value may not be visible. I'd 
rather keep it as while(true)
# The removal of triggerState should probably be moved from 
OverseerTriggerThread to ScheduledTriggers.remove after trigger is closed 
otherwise NoNodeException will be needlessly logged if the removed trigger was 
executing concurrently. Also I don't see the corresponding removal of the 
events path anywhere.
# When a trigger is replaced, we call restoreState(old.trigger) and then when 
the trigger is run for the first time, it restores (the same data) from ZK 
again. Not a big deal, just something I noticed. We can solve it setting replay 
to false if we want to.
# After we deque the event we assert that its id is same as the event we 
expected. Can we log if they don't match in addition to the assert?
# The use isClosed in ScheduledTrigger is not safe because the close() method 
will always be called by a different thread than the one executing the run() 
method. So either make isClosed volatile or an AtomicBoolean.

> Persist intermediate trigger state in ZK to continue tracking information 
> across overseer restarts
> --------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10515
>                 URL: https://issues.apache.org/jira/browse/SOLR-10515
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Andrzej Bialecki 
>              Labels: autoscaling
>             Fix For: master (7.0)
>
>         Attachments: SOLR-10515.patch, SOLR-10515.patch
>
>
> The current trigger design is simplistic and keeps all the intermediate state 
> in memory. But this presents two problems when the overseer itself fails:
> # We lose tracking state such as which node was added before the overseer 
> restarted
> # A nodeLost trigger can never really fire for the overseer node itself
> So we need a way, preferably in the trigger API itself to save intermediate 
> state or checkpoints so that it can seamlessly continue on overseer restarts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to