steveloughran commented on PR #5402:
URL: https://github.com/apache/hadoop/pull/5402#issuecomment-1443788914

   
   I really don't like how the results come back.
   
   I'm going to propose adding IOStatistics support to distcp so lined up for 
future work and to not modify the source config to suddenly become two way 
exchange of data
   
   1. DistCp to implement IOStatisticsSource
   1. until job finishes, getIOStatistics() to return null
   3. when job finished, 
   
   ```
   
   // to create a builder
   IOStatisticsStore iostats = IOStatisticsBinding.iostatisticsStore()
     .withCounter(DISTCP_TOTAL_BYTES_COPIED)
     .build()
     
   // then set the counter to the retrieved value
   iostats.setCounter(DISTCP_TOTAL_BYTES_COPIED, <counter>)
   ```
   
   This is extra work and you have to learn a new api, but
   
   * IOStatisticsAssertions has the asserts
   * IOStatisticsLogging has pretty printing
   * you can take an IOStatisticsSnapshot and send over the wire as json or 
java serialized object
   * lines it up perfectly for us collecting more detailed stats, not just from 
the workers (trickier...) but also cost of directory scanning, cleanup etc.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to