steveloughran commented on pull request #2971:
URL: https://github.com/apache/hadoop/pull/2971#issuecomment-1067124802


   just pushed an update with
   * PathOutputCommitter logs at factory
   * a bit more vaildation of the manifest summary data (which showed the 
testing-only-path list wasn't marshalling spaces properly, a bug which must 
still be in the s3a code)
   
   there's a hardcoded limit on the number of files which get listed in that 
success data (100), so that on big jobs the time to write the success file 
doesn't itself slow the job down.
   
   is that too big a number? as if paths are long you could still have 50 kib 
of data or more.
   could cut down to something minimal, like, say, 20. enough for basic tests 
but not for performance issues.
   
   tested azure cardiff.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to