Hi All, Carbon currently caches all block/blocklet datamap index information into the driver. And for bloom type of datamap, it can prune the splits in a distributed way using distributed datamap pruning. In the first case, there are limitations like driver memory scale up and reusing of one driver cache by others is not possible. In the second case, there are limitations like there is no guarantee that the next query goes to the same executor to reuse the cache.
Based on the above problems there is a need to have a centralised index cache server. Please find below the link for the design document. https://docs.google.com/document/d/161NXxrKLPucIExkWip5mX00x2iOPH6bvsuQnCzzp47E/edit?ts=5c542ab4#heading=h.x0qaehgkncz5 Thanks Kunal Kapoor