npawar opened a new issue #4345: Config to make realtime non-winner servers 
download the segment instead of build
URL: https://github.com/apache/incubator-pinot/issues/4345
 
 
   During realtime consumption, when the rows/time threshold is reached, one 
winner is chosen among all replicas. This winner builds the segment, uploads it 
to the controller. After this, the segment metadata is updated, and ideal state 
is updated. The ideal state update sends a CONSUMING-> ONLINE state transition 
to all replicas. Based on the code in 
LLRealtimeSegmentDataManager::goOnlineFromConsuming, each replica is given 10 
minutes for the consuming thread to die. During this time, the consumers are 
either asked to catchup and build segment or discard and download segment. 
After this is completed (build or download), the replica can mark itself 
ONLINE. 
   The time it takes for the replica to be marked ONLINE, varies depending on 
which of these paths it was asked to take. For the winner server, the time will 
be the least, as it has already built segment, and it will come up ONLINE the 
fastest. For servers asked to download, the time will be slightly more than the 
winner (mostly dependent on network speed). For servers asked to build their 
own segment, this time will be dependent on segment build time. Segment builds 
can become time consuming operations, depending on data size, indexing, 
sorting, consumption time, etc.  As a result, they can take much longer to come 
up ONLINE.
   We might encounter a significant amount of time, when only 1 replica is 
ONLINE to serve traffic, and hence have to lift all the load by itself.
   
   On way to solve this problem is to add a config, to make non-winner servers 
always download the segment instead of rebuilding. This is advantageous in more 
ways, as we will avoid the heap penalty of rebuilding, sorting, making inverted 
indexes, and simply download the ready made segment.
   
   
   @mcvsubbu @Jackie-Jiang 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to