[ https://issues.apache.org/jira/browse/APEXMALHAR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bhupesh Chawda resolved APEXMALHAR-2366. ---------------------------------------- Resolution: Done Fix Version/s: 3.8.0 > Apply BloomFilter to Bucket > --------------------------- > > Key: APEXMALHAR-2366 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366 > Project: Apache Apex Malhar > Issue Type: Improvement > Reporter: bright chen > Assignee: bright chen > Fix For: 3.8.0 > > Original Estimate: 192h > Remaining Estimate: 192h > > The bucket get() will check the cache and then check from the stored files if > the entry is not in the cache. The checking from files is a pretty heavy > operation due to file seek. > The chance of check from file is very high if the key range are large. > Suggest to apply BloomFilter for bucket to reduce the chance read from file. > If the buckets were managed by ManagedStateImpl, the entry of bucket would be > very huge and the BloomFilter maybe not useful after a while. But If the > buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain > amount of entry and BloomFilter would be very useful. > For implementation: > The Guava already have BloomFilter and the interface are pretty simple and > fit for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use > Sink while Guava 14 use PrimitiveSink). -- This message was sent by Atlassian JIRA (v6.3.15#6346)