> we already took in consideration Bloom Filter for a related issue [2]. > We decided that is still not too optimal since it leads toward content > duplication and I would like to avoid that for now > > [2] https://issues.apache.org/jira/browse/SLING-3290 >
Well, imho, bloom filters won't duplicate content -- they'd just have bit-masks to tentatively mark existence of a value. Moreover, if we use guava's implementation (which I think sling doesn't want to do... if I am reading SLING-3290 correctly), then we can serialize them on clean shutdown to have practically no work done during startup. For crashes, we can probably live with re-creating the filter again. About, BloomFilterUtils attached in SLING-3290, I think it's just using 1 hash function to create mask. In general, bloom filter implementation would have more number of hashes to configure less false-positives. About caching actual data in RAM (and assuming sling would sit on top of Oak??) -- should caching of most used nodes be a responsibility of repository implementation?.. but, that's probably a different discussion. Thanks, Vikas