The TezIDCache is memory-saving cache, similar in function to java String.intern but for objects. Tez states uses an event-based multithreaded message passing system where hundreds of thousands of messages may be in flight concurrently. A cache allows great reduction of message size and therefore runtime memory requirements. However, Tez was also designed to allow millions of tasks per DAG and tens of thousands of DAGs per session (perhaps more). So to protect against memory bloat, the cache is evaporative and uses soft references that the garbage collector can clear when not in use any long or under memory pressure.
So it has extra complication to balance against the design for two demands. On Thu, Jan 28, 2021 at 2:03 PM David <[email protected]> wrote: > Hello, > > In the class TezID there is a caching mechanism I can't figure out. What > us the purpose of caching these objects? This is much like a set since the > key and value are the same. Is there some requirement that the items in the > cache have to be globally unique? Is this some sort of memory saving > optimization to only maintain a single instance of each value? > > Thanks. >
