[389-users] Re: replication in containerized 389ds

William Brown Wed, 15 May 2019 19:28:28 -0700


> On 16 May 2019, at 12:16, aravind gosukonda <arabha...@gmail.com> wrote:
> 
> Hello William,
> 
> Thank you for the advice.
>> Hey there! 
>> 
>> Great to hear you want to use this in a container. I have a few things to 
>> advise here.
>> 
>> From reading this it looks like you want to have:
>> 
>> [ Container 1 ]    [ Container 2 ]    [ Container 3 ]
>>          |                       |                           |
>> [                       Shared Volume                          ]
>> 
>> So first off, this is *not* possible or supported. Every DS instance needs 
>> it's own
>> volume, and they replicate to each other:
>> 
>> [ Container 1 ]    [ Container 2 ]    [ Container 3 ]
>>         |                            |                         | 
>> [   Volume 1   ]   [     Volume 2 ]    [    Volume 3   ]
>> 
>> You probably also can't autoscale (easily) as a result of this. I'm still 
>> working
>> on ideas to address this ... 
>> 
>> But you can manually scale, if you script things properly.
> I have a separate persistent volume mounted to each container, as you 
> suggest. I use a statefulset, so the same volume is mounted across container 
> replacements.
> 
>> Every instance needs it's own changelog, and that is related to it's replica 
>> ID.
>> If you remove a replica there IS a clean up process. Remember, 389 is not 
>> designed as a
>> purely stateless app, so you'll need to do some work to manage this. 
> I've setup each instance to have it's own changelog, present in the 
> persistent volume. The scenario I had in mind was, if a container is deleted 
> and recreated, for any reason. My assumption is it'll take a few minutes, or 
> probably hours, in the worst case scenario. For all practical purposes, this 
> will be like a reboot of a host running a ds instance. Should I have any 
> checks to see if it's working, or leave it alone and let replication deal 
> with the delay?


A simple way to consider this is that every 389 instance in a container is a 
read-only replica, then you simplfy your system a lot (RO instances have a 
replica ID of 65535 (I think)). This way on startup/shutdown you just re-init 
the RO from an external hub or similar, then you don't care if you delete the 
volume associate with the container.

If you plan to make your container instances writeable, you should probably not 
autoscale - consider a container addition/removal the same as adding/removing a 
host, requiring a clean ruv, and other maintenance tasks to be performed. 
Consider each persistent volume with a replica id, db, changelog, as the 
"instance" and the container just enables access to it. 

So every time you add another container to the scaling, you need to add another 
persistent volume, with it's own unique replica Id's, db, changelog, and then 
have replication between them.

Perhaps what could help me is a diagram of your planned infrastructure? 

> 
>> 
>> You'll need to just assert they exist statefully - ansible can help here.
> Since I'm using persistent volumes, the replication agreements will be in 
> place, if it's a configured instance. It struck me while writing this reply, 
> that a container replacement, in my case, will be similar to a host reboot, 
> as all the config/data is available in a persistent volume. In this case, do 
> I need to treat container replacement differently?

To help with this, let's assume:

[ Container 1 ]
         |
[ Volume ID abcd ]

Now you destroy container 1 and upgrade to a newer version - if this is the 
case, so long as all your stateful data is in the volume (dse.ldif, db, 
changelog db), then this is fine:

[ Container NEW! ]
         |
[ Volume ID abcd ]

It would act like container 1 did, with the same replica ID etc.

> 
>> What do you mean by "re-init" here? from another replica? The answer is ...
>> "it depends".
> 
> 
>> So many things can go wrong. Every instance needs it's own volume, and data 
>> is shared
>> via replication. 
>> 
>> Right now, my effort for containerisation has been to help support running 
>> 389 in atomic
>> host or suse transactional server. Running in kubernetes "out of the box" is 
>> a
>> stretch goal at the moment, but if you are willing to tackle it, I'd fully 
>> help and
>> support you to upstream some of that work. 
>> 
>> 
>> Most likely, you'll need to roll your own image, and youll need to do some 
>> work in
>> dscontainer (our python init tool) to support adding/removing of replicas, 
>> configuration
>> of the replicaid, and the replication passwords. 
> Since I started this project a while ago, I have been using a base image and 
> installing 389 on top of it, with some modifications, taken from 
> https://github.com/dabelenda/container-389ds/blob/master/Dockerfile, which 
> disable hostname checks, remove the startup via systemd, etc. I'm using 
> kubernetes secrets for storing passwords for directory manager, replication 
> manager, etc. For replica id configuration, as I'm using a statefulset which 
> spins up containers with names like 389-ds-0, 389-ds-1, 389-ds-2, I'm reading 
> the hostname of the container and generating the replica ID.  I haven't yet 
> tried  the dscontainer tool, which I see that does some of the things that 
> the linked dockerfile does, and a lot more too. 

It would be great to have some more testing of the dscontainer tool too, so 
please see how that goes. You can use the latest with 
opensuse/tumbleweed:latest as a docker base image, and just zypper in 
389-ds-base. If you want even NEWER versions, you can look at network:ldap as a 
repo - I'm happy to help provide dockerfile advice for these cases. These 
assume all your state is in /data, so provided you have that you can work as 
per the example above. 

> 
>> 
>> At a guess your POD architecture should be 1 HUB which receives all 
>> incomming replication
>> traffic, and then the HUB dynamically adds/removes agreements to the the 
>> consumers, and
>> manages them. The consumers are then behind the haproxy instance that is 
>> part of kube. 
>> 
>> Your writeable servers should probably still be outside of this system for 
>> the moment :) 
>> 
>> 
>> Does that help? I'm really happy to answer any questions, help with planning 
>> and
>> improve our container support upstream with you. 
>> 
>> Thanks, 
>> 
>> —
>> Sincerely,
>> 
>> William Brown
>> 
>> Senior Software Engineer, 389 Directory Server
>> SUSE Labs
> 
> Thanks,
> Aravind
> _______________________________________________
> 389-users mailing list -- 389-users@lists.fedoraproject.org
> To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org

—
Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server
SUSE Labs
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org

[389-users] Re: replication in containerized 389ds

Reply via email to