Github user WangTaoTheTonic commented on the issue:

    https://github.com/apache/flink/pull/3335
  
    I've read all guys and list preconditions and solutions for this directory 
permission setting. 
    
    ## Preconditions
    1. Every flink job(session or single) can specify a directory storing 
checkpoint, called `state.backend.fs.checkpointdir`.
    2. Different jobs can set same or different directories, which means their 
checkpoint files can be stored in one same or different directories, with 
**sub-dir** created with their own job-ids.
    3. Jobs can be run by different users, and users has requirement that one 
could not read chp files written by another user, which will cause information 
leak.
    4. In some condition(which is relatively rare, I think), as @StephanEwen 
said, users has need to access other users’ chp files for cloning/migrating 
jobs.
    5. The chp files path is like: 
`hdfs://namenode:port/flink-checkpoints/<job-id>/chk-17/6ba7b810-9dad-11d1-80b4-00c04fd430c8`
    
    ## Solutions 
    ### Solution #1 (would not require changes)
    1. Admins control permission of root directory via HDFS ACLs(set it like: 
user1 can read&write, user2 can only read, …).
    2. This has two disadvantages: a) It is a huge burden for Admins to set 
different permissions for large number of users/groups); and b) sub-dirs 
inherited permissions from root directory, which means they are basically same, 
which make it hard to do fine grained control.
    ### Solution #2 (this proposal)
    1. We don’t care what permission of the root dir is. It can be create 
while setup or job running, as long as it is available to use.
    2. We control every sub-dir created by different jobs(which are submitted 
by different users, in most cases), and set it to a lower value(like “700”) 
to prevent it to be read by others.
    3. If someone wanna migrate or clone jobs across users(again, this scenario 
is rare in my view), he should ask admins(normally HDFS admin) to add ACLs(or 
whatever) for this purpose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to