GJL commented on issue #10746: [FLINK-15417] Remove the docker volume or mount 
when starting Mesos e…
URL: https://github.com/apache/flink/pull/10746#issuecomment-574229434
 
 
   When we opened FLINK-15377, we discovered that that the mesos logs cannot be 
deleted due to permission problems (logs written in the container have a 
different owner than the host's current user id). Since we are also deleting 
the job's output that is from within the container, I was wondering why the 
test is currently passing on Travis at all. It turns out that we also fail to 
remove the job's output (`${TEST_DATA_DIR}/out/wc_out_mesos`). However, because 
the clean up code is executed as a `trap`, and we redirect `stderr` to 
`/dev/null` [1], the error is never visible on Travis. 
   
   Your PR works around the permission problem by copying the data from the 
container. The issues I see with this approach are:
   - `wait_job_terminal_state_mesos` uses a new strategy to poll whether the 
job has terminated (compared to the standalone mode)
   - `copy_logs_from_container` will not work if the container dies 
unexpectedly, making it hard to debug a certain class of bugs (mesos exiting 
prematurely)
   
   The approach described 
[here](https://vsupalov.com/docker-shared-permissions/) doesn't work on OS X 
due to the docker daemon running on a hypervisor. However, on OS X we 
apparently do not suffer from the permission issue.
   
   A simple way out of this might be to just create the directories in advance 
with the right permissions. For example, if we ran
   
   ```
   mkdir ${TEST_DATA_DIR}/out/ # run on the host, not in the container
   ```
   
   prior to submitting the job, we will be able to delete the `/out` directory 
later from the host (unless nested directories are created within the 
container). Another option I see is to run `chmod -R ugo+rw` at the end of the 
test from within the container against all files/directories that we need to 
delete later from the host.
   
   Let me know what you think.
   
   [1] 
https://github.com/apache/flink/blob/6f6fb43ca2f8413e81a1b19e77c5cf3101b7e61d/flink-end-to-end-tests/test-scripts/test-runner-common.sh#L107

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to