I see. Valid points.
Whenever you break a production site - do you try to add a test which
simulates the parameters of the breakage?
It sounds to me like some sort of an image versioning could still help
here, that way you can really "roll back" (actually boot to a previous
version of the image) properly.
For instance, VyOS (http://vyos.net/wiki/Upgrade) roll out new versions
this way. I'm not sure how exactly they do that but the bottom line is that
it's possible to upgrade to the next release and still save all the
versions and configuration to roll back if you have to.

On 7 August 2016 at 14:18, Elazar Leibovich <elaz...@gmail.com> wrote:

> It's radio antenna.
>
> It is of course tested before to some extent, in a "staging" environment.
>
> But since the physical environment varies, and sometimes antenna related
> parameters change between releases (e.g., duration of receive time), it is
> not easy to know you're not breaking something for someone by mistake.
>
> It could be for example the physical location of the antenna at the client
> which would make a difference.
>
>
> On Sat, Aug 6, 2016 at 2:27 AM, Amos Shapira <amos.shap...@gmail.com>
> wrote:
>
>> What kind of hardware is this that's connected to the servers, and what
>> does the software do that you can't test before installing on production
>> servers?
>>
>> On 6 August 2016 at 02:14, Elazar Leibovich <elaz...@gmail.com> wrote:
>>
>>> All real servers, with custom hardware attached, geographically
>>> distributed across the planet.
>>>
>>> Real people actually use the hardware attached to this computers, and
>>> it's not obvious to test whether or not it failed.
>>>
>>> The strategy therefor is, deploy randomly to small percentage of the
>>> machines, wait to see if you get complains from those customers using these
>>> hardware devices, and if everything went well, update the rest of the
>>> servers.
>>>
>>> The provisioning solution is chef, but I'm open to changing it. As I
>>> said, I don't think it makes too much difference.
>>>
>>> As of immutable server images, I'd do it with ZFS/brtfs snapshots
>>> (+docker/machinectl/systemd-nspawn if you must have some sort of
>>> virtual environment), but it's probably a better idea than apt-get install
>>> pkg=oldversion. Immutable filesystem for execution is of course not enough,
>>> since you might have migrations for the mutable part, etc. In this
>>> particular case, I don't think it's a big deal.
>>>
>>> You see, not everything is a web startup with customer facing website ;-)
>>>
>>> Thanks,
>>> Appreciate you sharing your experience.
>>> I'm not disagreeing with your points, but in this particular case, where
>>> testing is expensive, not all of them seems valid.
>>>
>>> On Fri, Aug 5, 2016 at 3:15 PM, Amos Shapira <amos.shap...@gmail.com>
>>> wrote:
>>>
>>>> What provisioning tools do you use to manage these servers? Please tell
>>>> me you aren't doing all of this manually.
>>>> Also what's your environment? All hardware servers? Any virtualisation
>>>> involved? Cloud servers?
>>>>
>>>> Reading your question it feels like you are setting yourself up to fail
>>>> instead of minimising the failure altogether.
>>>>
>>>> What I suggest is that you test your package automatically in a test
>>>> environment (to me, Vagrant + Rspec/ServerSpec would be first candidates to
>>>> check) then rollout the package to the repository for the servers to pick
>>>> it up.
>>>>
>>>> As for "roll-back" - with comprehensive automatic testing this concept
>>>> is becoming obsolete, there is no such thing as "roll-back" only
>>>> "roll-forward", i.e. since the testing and rolling out are small and
>>>> "cheap", it should be feasible to fix whatever problem was found instead of
>>>> having to revert the change altogether.
>>>>
>>>> If you are in a properly supported virtual environment then I'd even go
>>>> for immutable server images (e.g. Packer building AMI's, or Docker
>>>> containers), then it's a matter of just firing up an instance of the new
>>>> image both when testing and in production.
>>>>
>>>> --Amos
>>>>
>>>> On 3 August 2016 at 16:55, Elazar Leibovich <elaz...@gmail.com> wrote:
>>>>
>>>>> How exactly you connect to the server is not in the scope of the
>>>>> discussion, and I agree that ansible is a sensible solution.
>>>>>
>>>>> But what you're proposing is to manually update the package on a small
>>>>> percent of the machines.
>>>>>
>>>>> Manual solution is fine, but I would like to hear experience of people
>>>>> who actually did that on many servers.
>>>>>
>>>>> There are many other issues, for example, how to you roll back?
>>>>>
>>>>> apt-get remove exposes you to the risk that the uninstallation script
>>>>> would be buggy. There are other solutions, e.g., btrfs snapshots on root
>>>>> partitions, but I'm curious to hear someone experienced with it to expose
>>>>> issues I didn't even thought of.
>>>>>
>>>>> Another issue is, how do you select the servers you try it?
>>>>>
>>>>> You suggested a static "beta" list, and I think it's better to select
>>>>> the candidates randomly on each update.
>>>>>
>>>>> Anyhow, how exactly you connect to the server is not the essence of
>>>>> the issue.
>>>>>
>>>>> On Wed, Aug 3, 2016 at 9:30 AM, Evgeniy Ginzburg <nad....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello.
>>>>>> I'm assuming that you have paswordless ssh to the servers in question
>>>>>> as root.
>>>>>> Also I assume that you don't use central management/deployment
>>>>>> software (ansible/puppet/chef)
>>>>>> In similar cases I usully use parallel-ssh (gnu-parallel is another
>>>>>> alternative).
>>>>>> First stage install the package manually on one server to see that
>>>>>> configuration is OK, daemons restart, etc...
>>>>>> If this stage is ok second step will be creating list of servers for
>>>>>> "complain" list and install package on them trough parallel-ssh.
>>>>>> Instead of waiting for complains, one can define metrics to check and
>>>>>> use some monitoring appliance for verification.
>>>>>> I case of failure remove package from repository and remove-install
>>>>>> again.
>>>>>> Third will be parallel-ssh install on all the servers.
>>>>>>
>>>>>> P. S. In case of few tens of servers I'd prefer to work with ansible
>>>>>> or alternative, it's worh it in most cases/
>>>>>>
>>>>>> Best Regards, Evgeniy.
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 2, 2016 at 8:50 PM, Elazar Leibovich <elaz...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm having a few (say, a few tens) Debian machines, with a local
>>>>>>> repository defined.
>>>>>>>
>>>>>>> In the local repository I have some home made packages I'm building
>>>>>>> and pushing to the local repository.
>>>>>>>
>>>>>>> When I'm upgrading my package, I want to be sure the update wouldn't
>>>>>>> cause a problem.
>>>>>>>
>>>>>>> So I wish to install them on a few percentage of the machines, wait
>>>>>>> for complaints.
>>>>>>>
>>>>>>> If complaints arrive - roll back.
>>>>>>> Otherwise keep upgrading the whole machines.
>>>>>>>
>>>>>>> I'll appreciate your advice and experience of similar situation,
>>>>>>> I'll appreciate if someone who had actual real life experience with
>>>>>>> this situation would mention it in the comments.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Linux-il mailing list
>>>>>>> Linux-il@cs.huji.ac.il
>>>>>>> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> So long, and thanks for all the fish.
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Linux-il mailing list
>>>>> Linux-il@cs.huji.ac.il
>>>>> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> <http://au.linkedin.com/in/gliderflyer>
>>>>
>>>
>>>
>>
>>
>> --
>> <http://au.linkedin.com/in/gliderflyer>
>>
>
>


-- 
<http://au.linkedin.com/in/gliderflyer>
_______________________________________________
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il

Reply via email to