[GitHub] [hudi] prashantwason opened a new pull request #2495: [HUDI-1553] Configuration and metrics for the TimelineService.

GitBox Tue, 26 Jan 2021 15:28:41 -0800


prashantwason opened a new pull request #2495:
URL: https://github.com/apache/hudi/pull/2495



   ## What is the purpose of the pull request
   
   TimelineServer uses Javalin which is based on Jetty.
   
   By default Jetty:
   
   Has 200 threads
   Compresses output by gzip
   Handles each request sequentially
    
   
   On a large-scale HUDI dataset (2000 partitions), when TimelineServer is 
enabled, the operations slow down due to following reasons:
   
    - Driver process usually has a few cores. 200 Jetty threads lead to huge 
contention when 100s of executors connect to the Server in parallel.
    - To handle large number of requests in parallel, its better to handle each 
HTTP request in an asynchronous manner using Futures which are supported by 
Javalin.
    - The compute overhead of gzipping may not be necessary when the executors 
and driver are in the same rack or within the same datacenter 
   
   ## Brief change log
   
   Added settings to control the number of threads created, whether to gzip 
output and to use asynchronous processing of requests. 
   
   With all the settings enabled, a driver process with 8 cores is able to 
handle 1024 executors in parallel on a table with 2000 partitions (CLEAN 
operation which lists all partitions). The time per API requests was also 
reduced from 800msec to 60msec.
   
   
   ## Verify this pull request
   
   
   This pull request is already covered by existing tests, such as 
TimelineServer tests and integration tests.
   
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] prashantwason opened a new pull request #2495: [HUDI-1553] Configuration and metrics for the TimelineService.

Reply via email to