Hi All,

Currently Apex engine provides operator checkpointing in Hdfs ( with Hdfs
backed StorageAgents i.e. FSStorageAgent & AsyncFSStorageAgent )

We have observed that for applications having large number of operator
instances, hdfs checkpointing introduces latency in DAG which degrades
overall application performance.
To resolve this we had to review all operators in DAG and had to make few
operators stateless.

As operator check-pointing is critical functionality of Apex streaming
platform to ensure fault tolerant behavior, platform should also provide
alternate StorageAgents which will work seamlessly with large applications
that requires Exactly once semantics.

HDFS read/write latency is limited and doesn't improve beyond certain point
because of disk io & staging writes. Having alternate strategy to this
check-pointing in fault tolerant distributed in-memory grid would ensure
application stability and performance is not impacted.

I have developed a in-memory storage agent which I would like to contribute
as alternate StorageAgent for checkpointing.

Thanks,
Ashish

Reply via email to