steveloughran opened a new pull request #3289:
URL: https://github.com/apache/hadoop/pull/3289


   
   Speeding up the committer with key changes being
   
   * All writes under __magic trigger marker retention
     (no DELETEs after file/dir creation)
   * create(path, overwrite) skips all overwrite checks, including
     the LIST call intended to stop files being created over dirs
   * thread pool used for more parallelism in task commit.
   
   This is still WiP as it needs
   * cost tests to verify the optimisations are active
   * testing through spark
   
   Lots of changes in the tests because the committer has added
   a CommitContext class which manages the lifecycle of
   the thread pool and a set of thread local JSON serializers;
   this is now what is passed around in internal committer methods,
   so breaking tests calling in to them.
   
   It is a better design (one we should have done from the start);
   manifest committer is even better as all its operations "stages"
   are modular. Just means that a lot of tests stopped compiling.
   And as usual, mock tests played up.
   
   Finally, removed the injection/handling of inconsistent S3
   from the committer tests. Not needed, and simply complicating
   the code needlessly.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to