[GitHub] [incubator-doris] liutang123 opened a new issue #4854: Optimize fragment scheduling

GitBox Thu, 05 Nov 2020 20:29:20 -0800


liutang123 opened a new issue #4854:
URL: https://github.com/apache/incubator-doris/issues/4854



   Problem
   --
   Now, FE schedules fragments from root to leaf .
   However timetimes the delay of the previous fragment scheduling will affect 
the scheduling of the next fragment. If there are slow nodes, this problem will 
be more serious.
   
   Some logs are as follows:
   
   Fragment 1's instance log(instance start at **12:42.493897**):
   ```
   I1026 12:12:42.493897 160919 internal_service.cpp:150] exec plan fragment, 
fragment_instance_id=e048deb04c074e70-aaf0a5afc4095fa2, 
coord=TNetworkAddress(hostname=10.49.135.23, port=9020), backend=6
   EXCHANGE_NODE (id=10):(Active: 4s398ms, % non-child: 73.44%)
                 - FirstBatchArrivalWaitTime: 4s398ms
   ```
   Fragment 2's instance log(instance start at **12:45.642977**):
   ```
   I1026 12:12:45.642977 226269 internal_service.cpp:150] exec plan fragment, 
fragment_instance_id=e048deb04c074e70-aaf0a5afc4095f66, 
coord=TNetworkAddress(hostname=10.49.135.23, port=9020), backend=146
    EXCHANGE_NODE (id=8):(Active: 2s790ms, % non-child: 46.59%)
                   - FirstBatchArrivalWaitTime: 2s790ms
   ```
   Fragment 4's instance log(instance start at **12:12:46.598142**):
   ```
   I1026 12:12:46.598142 153965 internal_service.cpp:150] exec plan fragment, 
fragment_instance_id=e048deb04c074e70-aaf0a5afc4095edb, 
coord=TNetworkAddress(hostname=10.49.135.23, port=9020), backend=209
          EXCHANGE_NODE (id=6):(Active: 1s762ms, % non-child: 29.43%)
                 - DataArrivalWaitTime: 1s762ms
   ```
   Fragment 5's instance log(instance start at **12:12:47.194581**):
   ```
   I1026 12:12:47.194581  8638 internal_service.cpp:150] exec plan fragment, 
fragment_instance_id=e048deb04c074e70-aaf0a5afc4095eb9, 
coord=TNetworkAddress(hostname=10.49.135.23, port=9020), backend=375
      Instance e048deb04c074e70-aaf0a5afc4095eb9 
(host=TNetworkAddress(hostname:10.16.173.46, port:9060)):(Active: 1s156ms, %    
 non-child: 0.04%)
   ```
   
   We can see that the leaf fragment instances just run for 1 second, but due 
to the presence of slow instances in the front fragment, the start-up delay is 
5 seconds. 
   
   Proposol
   --
   **Option 1**
   Schedule fragments by DAG dependence. 
   If a query's fragments like follows:
                      Fragment 0
                    /                  \      
            Fragment 1        Fragment 2
                /        \
   Fragment 3   Fragment 4
   We first schedue Fragment 0, then Fragment 1 and Fragment 2, and last 
Fragment 3 and Fragment 4.
   **advantage**: This option does not need to add additional RPC.
   **disadvantage**: This is only a relief, not a fundamental solution to the 
scheduling problem.
   
   **Option 2**
   Add a `prepare_fragment` RPC.
   Starting a fragment consists of two steps:
   1. Prepare: create and register `DataStreamRecvr` in `DataStreamMgr`.
   2. Start: subtmit `FragmentMgr::exec_actual` to `FragmentMgr::_thread_pool`.
   **advantage**: Both the prepare and start phases are scheduled concurrently.
   **disadvantage**: 
   - Add one more RPC
   - Fragment scheduling may still be affected by slow nodes
                      Fragment 0
                    /                  \      
            Fragment 1        Fragment 2
                                          /        \
                          Fragment 3   Fragment 4
   Fragment 1 should start executing without being affected by fragment 2. but 
in this solution, fragment 1 cannot be executed until all fragment instances 
are prepared.
   **Option 3**
   Add a `prepare_fragment` RPC like **option 2** but Start fragment instances 
by DAG dependence.
                      Fragment 0
                    /                  \      
            Fragment 1        Fragment 2
                                          /        \
                          Fragment 3   Fragment 4
   Setp 1: prepare all fragment instances concurrently.
   Setp 2: Start a fragment when its front fragments are prepared. For example: 
we start fragment 1 when fragment 0's instances are all prepared.
   
   
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-doris] liutang123 opened a new issue #4854: Optimize fragment scheduling

Reply via email to