michaeljmarshall commented on pull request #8242:
URL: https://github.com/apache/pulsar/pull/8242#issuecomment-771935645


   @fransguelinckx - thanks for the added context. I am not familiar with 
OpenShift's container platform, so the context is helpful.
   
   It seems to me that the challenge here comes from competing paradigms 
between how OpenShift runs containers by default and how EKS (and other 
platforms) recommend running containers. OpenShift is prescriptive about 
running the user as part of the root group (so it can ensure containers have 
the right file permissions) and yet EKS recommends just the opposite (because 
it expects users to ensure file permissions are correct). Here are the Amazon 
docs that detail best practices for using a `PodSecurityPolicy` to prevent 
containers from running as the root user and the root group 
(https://aws.github.io/aws-eks-best-practices/security/docs/pods/#restrict-the-containers-that-can-run-as-privileged).
 Here is the brief part of the policy with an explicit comment about forbidding 
the root group.
   
   ```yaml
       fsGroup:
           rule: 'MustRunAs'
           ranges:
           # Forbid adding the root group.
           - min: 1
             max: 65535
   ```
   
   I'm not entirely sure how to best accommodate both paradigms. While 
OpenShift allows users to override a pod's UID/GID by specifying `runAsUser` 
and `runAsGroup`, they discourage it because it could affect other pods running 
in other namespaces. Here is the reference 
https://www.openshift.com/blog/a-guide-to-openshift-and-uids and here is the 
main quote:
   
   > When the application also needs to be executed under a specific custom 
UID, then the Pod definition needs to use a ServiceAccount with the RunAsAny 
privilege and the Pod needs to use the runAsUser to set the UID to the 
hardcoded UID. NOTE: Using a hardcoded UID is NOT recommended. Among issues 
with this approach is that it is prone to UID collisions with system UIDs or 
with UIDs of the same or different application running in a different Namespace 
expecting to use the same UID.
   
   If you're deploying your bookkeeper pods on dedicated hosts, I think that 
removes their main concern because other namespaces won't be deployed to the 
nodes. I'm not sure how you're deploying these pods though.
   
   One thing that might be beneficial to all is to remove the `VOLUME` 
instruction from pulsar's `Dockerfile` because it prevents file ownership 
updates to the `/pulsar` directories in derivative images. Docker does claim 
that specifying volumes in the `Dockerfile` is best practice 
[here](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#volume):
   
   > The VOLUME instruction should be used to expose any database storage area, 
configuration storage, or files/folders created by your docker container. You 
are strongly encouraged to use VOLUME for any mutable and/or user-serviceable 
parts of your image.
   
   and they mention the limitation of derivative images 
[here](https://docs.docker.com/engine/reference/builder/#notes-about-specifying-volumes):
   
   > Changing the volume from within the Dockerfile: If any build steps change 
the data within the volume after it has been declared, those changes will be 
discarded.
   
   All that said, I don't know how essential it is to use the `VOLUME` 
instruction. There are people who seem to think that it does not add enough 
value to a docker image, as seen in the top answer to this [stack 
overflow](https://stackoverflow.com/questions/44060341/is-it-a-docker-best-practice-to-use-volume-for-the-code).
   
   By removing the instruction, it'd make it easier for end users to make 
derivative images based on the official docker images instead of having to 
build their own image from source.
   
   It might also be relevant to note that kubernetes 1.20 now has a feature to 
manage the file permissions of volumes 
[here](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods).
 I haven't looked into this yet, but I'm wondering if it simplifies this 
problem by allowing the runtime to make sure the filesystem has the appropriate 
permissions set. 
   
   Personally, I would think there are more users who are looking for it to run 
as a non root user and group out of the box, which is why I submitted 
https://github.com/apache/pulsar/pull/8796.
   
   Another option is to produce two images to satisfy the different 
requirements.
   
   @sijie - what are your thoughts here? I think we'll need some input from the 
larger pulsar community.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to