pzhdfy opened a new pull request #8165: [Improvement] DataSegment intern 
improvement (reduce 60% memory consume on coordinator)
URL: https://github.com/apache/incubator-druid/pull/8165
 
 
   ### Description
   
   In our druid cluster, we have 10 millions active segments. And we set load 
rule with 2 replications.
   Then we find coordinator consume 50GB memory and cause GC problem.
   
   We dump the JVM and analysis the memory, then we found about 30 millions 
DataSegment objects.
   This is because one segment will generate 3 DataSegment objects.
   One is from poll DB in SQLMetadataSegmentManager
   The other two is from zookeeper announcement(2 replications) in 
BatchServerInventoryView
   
   This three DataSegment objects are usually the same,
   So can use
   Interner<DataSegment> DATA_SEGMENT_INTERNER = Interners.newWeakInterner();
   to deduplicate.
   
   1.When poll from DB or read from Znode, we use DATA_SEGMENT_INTERNER to 
deduplicate.
   2.When poll from DB, always update loadSpec in DataSegment, this is useful 
when deep storage migration.
   3.When  read from Znode, skip intern the realtime node.Because segment from 
realtime is short-time living and has incorrect size, dimensions,loadSpect
   
   With this  improvement.
   The memory consume in coordinator reduces to 20GB.
   And this is also useful in broker, from 35G to 18GB.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to