Re: Do openshift can keep Track of absolutely all service activity, in a High Availability (many replicas) Scenario?

Jonathan Yu Wed, 12 Oct 2016 10:01:14 -0700

On Wed, Oct 12, 2016 at 8:42 AM, Ricardo Aguirre Reyes | BEEVA MX <
[email protected]> wrote:


> Hi,
>
> I  have a doubt regarded the  openshift ability to keep Track of
> absolutely all service activity, in a High Availability (many replicas)
> Scenario.
>
>
> We are working in  build a microService that will  communicate through TCP
> (sockets) to Mainframe.
> We will  run several  Pods as replicas in  order t o achieve High
> Availability.
> We know that figuring everything in this way each transaction can be
> logged for the answering pod.
> Then we can store logging messages in elasticSearch and theoretically we
> can get even  dead pods (is this true?); we can  aggregated them based on
> application labels.
>

Yes, things can crash, and you can have situations where a pod is healthy
(according to its health checks
<http://kubernetes.io/docs/user-guide/production-pods/#liveness-and-readiness-probes-aka-health-checks>),
accepts a request for processing, and subsequently fails.


> Using multiple pods there will never be messages dropped on the floor
> because at least one pod will be up to answer.
>

It may be useful to think about the failure modes in the path between your
microservices and your mainframe service:

1. Top of rack switch failure
2. Cable failure
3. Power failure
4. Router pod failure
5. OpenShift node failure (application pod failure)

There are a lot of things to consider when building reliable,
high-performance distributed systems. This checklist is helpful:
https://monkey.org/~marius/checklist.pdf

Keep in mind TCP has checksum & retry mechanisms (handling line noise,
dropped packets, transient network blips)  but they do not handle
re-opening a broken connection and re-trying requests automatically.
Therefore, your service will need to handle this somehow.  And there's no
such thing as exactly-once systems
<https://brooker.co.za/blog/2014/11/15/exactly-once.html>, so your system
should be idempotent.


> But we do not know what happen at example if a message was already
> assigned to the pod1 and then if it goes done before receiveing the reply
> from the Destination.
>

If you open a connection and initiate a request, but your process crashes
before the response is received, then the destination server will send a
reply but your operating system won't know how to handle the request, since
nothing is holding the socket open anymore.  This manifests as a TCP RST
(Reset) being sent to the mainframe.

This wouldn't account for the case where a pod receives a reply but crashes
before completing its processing - for that, you need something more
sophisticated.

Does the openshift High Availability mechanism will resend the last message
> to another available pod, since "it knows" that is down.
>
> The problem is that my service cannot lost any message and it must record
> every activity.
>

I think many people have had success using Apache Kafka
<https://kafka.apache.org/intro>, which is a distributed message queue
(more precisely a replicated commit log). It persists messages for some
defined interval, allowing your application to replay messages in order to
ensure that nothing gets dropped.

-- 
Jonathan Yu, P.Eng. / Software Engineer, OpenShift by Red Hat / Twitter
(@jawnsy) is the quickest way to my heart <https://twitter.com/jawnsy>

*“A master in the art of living draws no sharp distinction between his work
and his play; his labor and his leisure; his mind and his body; his
education and his recreation. He hardly knows which is which. He simply
pursues his vision of excellence through whatever he is doing, and leaves
others to determine whether he is working or playing. To himself, he always
appears to be doing both.”* — L. P. Jacks, Education through Recreation
(1932), p. 1

_______________________________________________
dev mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev

Re: Do openshift can keep Track of absolutely all service activity, in a High Availability (many replicas) Scenario?

Reply via email to