I my experience I too have used block cache sizes in the 64KB range for the same reasons you listed. The biggest of which was because we were running upwards of 100GB caches and 1K block cache sizes are not really possible at that size. The biggest probably with the compaction is with the .tim file, the rest of the files are mostly sequential reads, but because that file is a tree it tends to jump all over the place during compaction. I would recommend if you want to speed up compaction (merges) to allow the tim files to be put into block cache during the merge (e.i. turn quiet reads off for those files). This of course could flow your cache with data that you are about to remove, so if you have the cache space it's the easiest solution.
Another idea could be to bypass the cache directory during merges and read directly from the hdfsdirectory. Then perhaps you could take advantage of the SC reads without having to deal with the cache directly. Aaron On Thu, Oct 20, 2016 at 3:53 AM, Ravikumar Govindarajan < [email protected]> wrote: > We have set a fairly large cacheSize of 64KB in block-cache for avoiding > too many keys, gc pressure etc... > > But CacheIndexInput tries to read 64KB of data during a cache-miss & fills > up the CacheValue. When doing short-circuit-reads, this could turn out to > be excessive no? For a comparison, lucene uses only 1KB buffers for the > same.. > > Do you think this will likely affect performance of searches albeit in a > minor way? > > -- > Ravi >
