shameersss1 commented on PR #6468: URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1926348440
>1. static map of path to metadata. This will grow without constraint on a long live process. The entries to the Map are removed during commitTask or abortTask operation to keep memory under control. --- > 2. Two jobs writing to same path will it corrupt the Map ? No, The path (complete) is guaranteed to be unique The paths stored here as part of `private static Map<String, List<Path>> taskAttemptIdToPath = new ConcurrentHashMap<>();` is the magic path, Eventhough the file name might be same, The magic path for two different jobs will be different since the jobId is included in the path. ----- >3. the static map would be a weak ref to something held strongly by the actual committer (see WeakReferenceMap). Once the actual task attempt is gc'd, Since the entries from the HashMap are removed during commitTask or abortTask operation is WeakHashMap still required? ---- >4. static structures should be per fs instances, so when an fs is cleaned up I am not sure why it should be scoped under fs object. For a simiar behaviour with storing in s3, Shouldn't the static structure be available to the whole JVM ? I mean shouldn't we able to access static structure irrespective of the fs object. ---- >5. 'm also worried about how a job could abort a task attempt on a different process which has failed. Before worrying about that too much, why don't you look in spark to see how it calls abort. I'm not worried about MapReduce except for testing -so how do itself calls the committee isn't so important. For example: we don't care about recovery from a failed attempt as spark itself cannot do this. I have covered this as part of the comment [here](https://github.com/apache/hadoop/pull/6468#issuecomment-1926304528). --- -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
