Re: [akka-user] upgrading production cluster (sharded) system

09goral . Tue, 20 Jan 2015 04:32:51 -0800

I guess you mean *blue/green *? :)

2015-01-20 11:38 GMT+01:00 Roland Kuhn <[email protected]>:


> Hi Peter,
>
> upgrading usually implies that some changes were made: bugs were fixed and
> features added. This usually also implies that some messages have change in
> format or meaning (if only from “broken” to “works now”). Operating nodes
> of different understanding within the same cluster is a very risky
> proposition as it is very hard to get right—the new nodes must be fully
> capable of understanding the old ones and they must also not confuse the
> old ones with new language. Other issues arise when talking to a shared
> data store: if new nodes write new data, will the old nodes be able to deal
> with it? Will they silently and unknowingly corrupt new records?
>
> For these reasons the stories from the field I have heard have all pointed
> towards doing red/blue deployments, starting a new cluster next to the old
> one and shifting traffic and deployment size between them in order to
> switch over.
>
> Regards,
>
> Roland
>
> 14 jan 2015 kl. 18:44 skrev Peter <[email protected]>:
>
> Hi
>
> I wonder if anyone has experience or thoughts to share about upgrading
> production cluster systems?
>
> I would ideally like to
>
>    - upgrade the cluster without downtime/scheduled outage
>    - not mutate infrastructure, in other words, deploy a new set of nodes
>    with the new version
>    - do a staged upgrade, first just a single node taking as little as
>    possible production traffic - a canary in the coal mine
>
>
> A little bit more about my specific environment
>
>    - cluster runs as a single EC2 autoscale group
>    - no akka roles (looking into this as a way to gain independence
>    between functional areas within the application & facilitate independent
>    upgrades - something akin to micro services to use the buzzword du jour)
>    - i don't use akka persistence but each sharded actor is backed by my
>    own distributed persistence mechanism based on DynamoDB
>    - there is some tolerance for stale reads but there could be some
>    cases where it's not acceptable
>
> My understanding is that the number of cluster shards should be kept
> constant irrespective of number of cluster nodes, so that the shard
> resolution also remains stable irrespective of number of cluster nodes, as
> in the example in the documentation. It sounds like the bundled rebalancing
> strategy (LeastShardAllocationStrategy) should do the trick when adding the
> first node (canary). I'm wondering if there's any suggestions for doing the
> rest?
>
>
>    - start all of the remaining new cluster nodes
>       - at what point does rebalancing get kicked off? is there a
>       specific event that triggers a rebalance? is it possible to delay until 
> all
>       the new nodes/X nodes has joined/Y time has passed to minimize 
> disruption
>       (single rebalance vs rebalance for every node)
>    - wait for period X to ensure rebalancing is complete and all buffered
>    messages during rebalancing has been processed
>       - is it possible to determine this programmatically?
>    - stop all the old version nodes
>       - one by one with a period in between or all at once?
>       - at this point, messages in flight are lost, need to fall back to
>       clients to retry
>
> It gets progressively more hand wavy towards the end as I'm still thinking
> about the details, would love some input & feedback!
>
> Thanks
> Peter
>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> *Dr. Roland Kuhn*
> *Akka Tech Lead*
> Typesafe <http://typesafe.com/> – Reactive apps on the JVM.
> twitter: @rolandkuhn
> <http://twitter.com/#!/rolandkuhn>
>
>  --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "Akka User List" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/akka-user/tC2RfJBruYA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Pozdrawiam,
Mateusz Górski

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] upgrading production cluster (sharded) system

Reply via email to