steveloughran commented on PR #5519:
URL: https://github.com/apache/hadoop/pull/5519#issuecomment-1515223731

   parallel test running failed everywhere, but I have improved 
ITestAbfsLoadManifestsStage performance
   
   * back to the original 200 manifest files
   * increase worker pool and buffer queue size (more significant before
     reducing the manifest count)
   
   brings test time down to 10s locally. IOStats does imply many MB of data is
   being PUT/GET so it is good to keep small so people running with less
   bandwidth don't suffer. Maybe, maybe, the size could switch
   with a -Dscale?
   
   iostat.
   there seems a lot of delete requests, but its because when we write the 
manifest it is done as a write to temp then rename, and the dest is deleted 
first, without any check.
   in production that cost is absorbed in task commit, and @60ms vs 40 for a 
head, we should decide what to do here. I think for renames in job commit, we 
could do the HEAD before the DELETE simply because that is bottleneck, so maybe 
do it here too...
   
   ```
   2023-04-19 19:43:05,489 INFO  [JUnit]: 
manifest.AbstractManifestCommitterTest 
(AbstractManifestCommitterTest.java:dumpFileSystemIOStatistics(450)) - 
Aggregate FileSystem Statistics counters=((action_http_delete_request=402)
   (action_http_delete_request.failures=200)
   (action_http_get_request=202)
   (action_http_head_request=404)
   (action_http_head_request.failures=202)
   (action_http_put_request=1103)
   (bytes_received=10160814)
   (bytes_sent=10160814)
   (committer_task_directory_count=20000)
   (committer_task_file_count=20000)
   (committer_task_manifest_file_size=10160814)
   (connections_made=2111)
   (directories_created=303)
   (files_created=200)
   (get_responses=2111)
   (job_stage_create_target_dirs=1)
   (job_stage_load_manifests=1)
   (job_stage_setup=1)
   (op_create=200)
   (op_create_directories=1)
   (op_delete=803)
   (op_get_file_status=407)
   (op_get_file_status.failures=202)
   (op_list_status=2)
   (op_load_all_manifests=1)
   (op_load_manifest=200)
   (op_mkdirs=605)
   (op_msync=1)
   (op_open=200)
   (op_rename=400)
   (rename_path_attempts=200)
   (send_requests=1103)
   (task_stage_save_manifest=200)
   (task_stage_save_task_manifest=200)
   (task_stage_setup=200));
   
   gauges=();
   
   minimums=((action_http_delete_request.failures.min=25)
   (action_http_delete_request.min=36)
   (action_http_get_request.min=40)
   (action_http_head_request.failures.min=22)
   (action_http_head_request.min=20)
   (action_http_put_request.min=24)
   (committer_task_directory_count=100)
   (committer_task_file_count=100)
   (committer_task_manifest_file_size=49990)
   (job_stage_create_target_dirs.min=259)
   (job_stage_load_manifests.min=2804)
   (job_stage_setup.min=183)
   (op_create_directories.min=256)
   (op_delete.min=25)
   (op_get_file_status.failures.min=22)
   (op_get_file_status.min=22)
   (op_list_status.min=87)
   (op_load_all_manifests.min=2627)
   (op_load_manifest.min=49)
   (op_mkdirs.min=24)
   (op_msync.min=0)
   (op_rename.min=70)
   (task_stage_save_manifest.min=273)
   (task_stage_save_task_manifest.min=144)
   (task_stage_setup.min=51));
   
   maximums=((action_http_delete_request.failures.max=413)
   (action_http_delete_request.max=291)
   (action_http_get_request.max=2031)
   (action_http_head_request.failures.max=439)
   (action_http_head_request.max=430)
   (action_http_put_request.max=2662)
   (committer_task_directory_count=100)
   (committer_task_file_count=100)
   (committer_task_manifest_file_size=50876)
   (job_stage_create_target_dirs.max=259)
   (job_stage_load_manifests.max=2804)
   (job_stage_setup.max=183)
   (op_create_directories.max=256)
   (op_delete.max=413)
   (op_get_file_status.failures.max=439)
   (op_get_file_status.max=22)
   (op_list_status.max=127)
   (op_load_all_manifests.max=2627)
   (op_load_manifest.max=2031)
   (op_mkdirs.max=245)
   (op_msync.max=0)
   (op_rename.max=932)
   (task_stage_save_manifest.max=2863)
   (task_stage_save_task_manifest.max=2757)
   (task_stage_setup.max=471));
   
   means=((action_http_delete_request.failures.mean=(samples=200, sum=9850, 
mean=49.2500))
   (action_http_delete_request.mean=(samples=202, sum=12448, mean=61.6238))
   (action_http_get_request.mean=(samples=202, sum=78955, mean=390.8663))
   (action_http_head_request.failures.mean=(samples=202, sum=12782, 
mean=63.2772))
   (action_http_head_request.mean=(samples=202, sum=8096, mean=40.0792))
   (action_http_put_request.mean=(samples=1103, sum=108966, mean=98.7906))
   (committer_task_directory_count=(samples=200, sum=20000, mean=100.0000))
   (committer_task_file_count=(samples=200, sum=20000, mean=100.0000))
   (committer_task_manifest_file_size=(samples=200, sum=10160814, 
mean=50804.0700))
   (job_stage_create_target_dirs.mean=(samples=1, sum=259, mean=259.0000))
   (job_stage_load_manifests.mean=(samples=1, sum=2804, mean=2804.0000))
   (job_stage_setup.mean=(samples=1, sum=183, mean=183.0000))
   (op_create_directories.mean=(samples=1, sum=256, mean=256.0000))
   (op_delete.mean=(samples=401, sum=22278, mean=55.5561))
   (op_get_file_status.failures.mean=(samples=202, sum=12806, mean=63.3960))
   (op_get_file_status.mean=(samples=1, sum=22, mean=22.0000))
   (op_list_status.mean=(samples=2, sum=214, mean=107.0000))
   (op_load_all_manifests.mean=(samples=1, sum=2627, mean=2627.0000))
   (op_load_manifest.mean=(samples=200, sum=79570, mean=397.8500))
   (op_mkdirs.mean=(samples=302, sum=15536, mean=51.4437))
   (op_msync.mean=(samples=1, sum=0, mean=0.0000))
   (op_rename.mean=(samples=200, sum=22999, mean=114.9950))
   (task_stage_save_manifest.mean=(samples=200, sum=115797, mean=578.9850))
   (task_stage_save_task_manifest.mean=(samples=200, sum=82893, mean=414.4650))
   (task_stage_setup.mean=(samples=200, sum=21878, mean=109.3900)));
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to