Re: Fault-tolerant cache backed by a store

Chandni Singh Sun, 08 Nov 2015 22:36:30 -0800

Hi,
This contains the overview of large state management.
Some parts need more description which I am working on but please free to
go through it and any feedback is appreciated.


Thanks,
Chandni


On Tue, Oct 20, 2015 at 8:31 AM, Pramod Immaneni <[email protected]>
wrote:

> This is a much needed component Chandni.
>
> The API for the cache will be important as users will be able to plugin
> different implementations in future like those based off of popular
> distributed in-memory caches. Ehcache is a popular cache mechanism and API
> that comes to bind. It comes bundled with a non-distributed implementation
> but there are commercial distributed implementations of it as well like
> BigMemory.
>
> Given our needs for fault tolerance we may not be able to adopt the ehcache
> API as is but an extension of it might work. We would still provide a
> default implementation but going off of a well recognized API will
> facilitate development of other implementations in future based off of
> popular implementations already available. We will need to investigate if
> we can use the API as is or with relatively straightforward extensions
> which will be a positive for using it. But if the API turns out to be
> significantly deviating from what we need then that would be a negative.
>
> Also it would be great if we could support an iterator to scan all the
> keys, lazy loading as needed, since this need comes up from time to time in
> different scenarios such as change data capture calculations.
>
> Thanks.
>
> On Mon, Oct 19, 2015 at 9:10 PM, Chandni Singh <[email protected]>
> wrote:
>
> > Hi All,
> >
> > While working on making the Join operator fault-tolerant, we realized the
> > need of a fault-tolerant Cache in Malhar library.
> >
> > This cache is useful for any operator which is state-full and stores
> > key/values for a very long period (more than an hour).
> >
> > The problem with just having a non-transient HashMap for the cache is
> that
> > over a period of time this state will become so large that checkpointing
> it
> > will be very costly and will cause bigger issues.
> >
> > In order to address this we need to checkpoint the state iteratively,
> i.e.,
> > save the difference in state at every application window.
> >
> > This brings forward the following broad requirements for the cache:
> > 1. The cache needs to have a max size and is backed by a filesystem.
> >
> > 2. When this threshold is reached, then adding more data to it should
> evict
> > older entries from memory.
> >
> > 3. To minimize cache misses, a block of data is loaded in memory.
> >
> > 4. A block or bucket to which a key belongs is provided by the user
> > (operator in this case) as the information about closeness in keys (that
> > can potentially reduce future misses) is not known to the cache but to
> the
> > user.
> >
> > 5. lazy load the keys in case of operator failure
> >
> > 6. To offset the cost of loading a block of keys when there is a miss,
> > loading can be done asynchronously with a callback that indicates when
> the
> > key is available. This allows the operator to process other keys which
> are
> > in memory.
> >
> > 7. data that is spilled over needs to be purged when it is not needed
> > anymore.
> >
> >
> > In past we solved this problem with BucketManager which is not in open
> > source now and also there were some limitations with the bucket api - the
> > biggest one is that it doesn't allow to save multiple values for a key.
> >
> > My plan is to create a similar solution as BucketManager in Malhar with
> > improved api.
> > Also save the data on hdfs in TFile which provides better performance
> when
> > saving key/values.
> >
> > Thanks,
> > Chandni
> >
>

Re: Fault-tolerant cache backed by a store

Reply via email to