steveloughran commented on pull request #2971:
URL: https://github.com/apache/hadoop/pull/2971#issuecomment-1059545705
just updated with changes from sseth's review
* renamed StoreOperations to ManifestStoreOperations, set scope up.
that makes for a change which touches many classes.
* lots of other review points, all minor in comparison.
+ new DirEntry type in manifest for dest dirs only, contains
dest and status. Status is always 0, "unknown", for now.
I think based on future stats of mkdir performance, we may want to
evolve dir preparation with two options.
Probing for dest dirs in task commit. no side effects and something we
can do in parallel with the listing process. Will allow all probes for
dest dirs to be omitted from job commit. There will be duplication
in the tasks, but off the critical path/parallelised with the treewalk.
Actually attempting to create dest dirs in TaskCommit. as well as being
slightly side effecting (but no new files..) we would have to deal with
* two task commits clashing. use same recovery as job commit.
* file at dest. note and report for job commit to process.
mkdir in task is clearly more complex; I will ignore for now and
leave for a future iteration based on job stats analysis of real
world jobs.
getFileStatus is low cost and low complexity.
job commit will
1. merge dir list and status
2. those with files: delete and create (do this first)
3. those not present
one issue here though: final task commit will be slower; all previous tasks
will have repeated the operation.
will it actually speed things up?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]