> we already took in consideration Bloom Filter for a related issue [2].
> We decided that is still not too optimal since it leads toward content 
> duplication and I would like to avoid that for now
>
> [2] https://issues.apache.org/jira/browse/SLING-3290
>

Well, imho, bloom filters won't duplicate content -- they'd just have
bit-masks to tentatively mark existence of a value. Moreover, if we
use guava's implementation (which I think sling doesn't want to do...
if I am reading SLING-3290 correctly), then we can serialize them on
clean shutdown to have practically no work done during startup. For
crashes, we can probably live with re-creating the filter again.

About, BloomFilterUtils attached in SLING-3290, I think it's just
using 1 hash function to create mask. In general, bloom filter
implementation would have more number of hashes to configure less
false-positives.

About caching actual data in RAM (and assuming sling would sit on top
of Oak??) -- should caching of most used nodes be a responsibility of
repository implementation?.. but, that's probably a different
discussion.

Thanks,
Vikas

Reply via email to