shameersss1 commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1926348440

   >1. static map of path to metadata. This will grow without constraint on a 
long live process.
   
   The entries to the Map are removed during commitTask or abortTask operation 
to keep memory under control.
   
   ---
   
   > 2. Two jobs writing to same path will it corrupt the Map ?
   
   No, The path (complete) is guaranteed to be unique The paths stored here as 
part of `private static Map<String, List<Path>> taskAttemptIdToPath = new 
ConcurrentHashMap<>();` is the magic path, Eventhough the file name might be 
same, The magic path for two different jobs will be different since the jobId 
is included in the path.
   
   -----
   
   >3. the static map would be a weak ref to something held strongly by the 
actual committer (see WeakReferenceMap). Once the actual task attempt is gc'd,
   
   Since the entries from the HashMap are removed during commitTask or 
abortTask operation is WeakHashMap still required?
   
   ----
   
   >4.  static structures should be per fs instances, so when an fs is cleaned 
up
   
   I am not sure why it should be scoped under fs object. For a simiar 
behaviour with storing in s3, Shouldn't the static structure be available to 
the whole JVM ? I mean shouldn't we able to access static structure 
irrespective of the fs object.
   
   ----
   
   >5. 'm also worried about how a job could abort a task attempt on a 
different process which has failed. Before worrying about that too much, why 
don't you look in spark to see how it calls abort. I'm not worried about 
MapReduce except for testing -so how do itself calls the committee isn't so 
important. For example: we don't care about recovery from a failed attempt as 
spark itself cannot do this.
   
   I have covered this as part of the comment 
[here](https://github.com/apache/hadoop/pull/6468#issuecomment-1926304528).
   
   ---


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to