RE: EntryProcessor execution semantics

Andrey Kornev Mon, 30 Nov 2015 09:03:13 -0800

Thank you, Alexey!

By stating that "sending a serialized EntryProcessor should be cheaper" you 
implicitly assume that the cache entry is big and the computation done by the 
processor is cheap. But what if it's not the case? What if the computation 
itself is quite expensive and depends on external data (which may happen to be 
constantly changing -- like the stock tickers?), or is done for a side effect? 
What is the EP feature good for after all, given the constraints you posed 
below? Incrementing an integer counter, as the example in Ignite documentation 
does? :)


Of course, JCache specification is open to interpretation, and one might argue 
that the EntryProcessor is a performance feature, but my reading of the spec 
makes me think  (and it looks like both Coherence and Hazelcast agree with me) 
that it's first and foremost a way to atomically mutate a cache entry without 
incurring an overhead of locking.

Let's see now. A single call to Cache.invoke() produces
- a single EP invocation on the key's primary node in Coherence. Period.
- a single EP invocation on the key's primary node in Hazelcast, but they offer 
the non-JCache BackupAwareEntryProcessor class that allows the user "to create 
or pass another EntryProcessor to run on backup
partitions and apply delta changes to the backup entries".
- In Ignite: 
-- a single invocation on the key's primary node if the cache is ATOMIC (both 
REPLICATED and PARTITIONED).
-- N+1 invocations (where N is the number of nodes the cache is started on) if 
the cache is REPLICATED and TRANSACTIONAL.
-- B+2 invocations (where B is the number of replicas) if the cache is 
PARTITIONED and TRANSACTIONAL.

Go figure! Alexey, you're suggesting that a user without deep knowledge of 
Ignite internals would find such behavior expected and natural? Even with deep 
knowledge of Ignite internals it's hard to understand the logic.

Neither Coherence nor Hazelcast require the EP to be stateless and side-effect 
free. Even better Hazelcast makes the choice explicit by providing the backup 
aware processor API and it's then up to the user to ensure statelessness etc. 
But Ignite is just too clever.

I'd really like to ask the brains behind the current design to reconsider.

Regards
Andrey

> Date: Mon, 30 Nov 2015 13:11:13 +0300
> Subject: Re: EntryProcessor execution semantics
> From: alexey.goncha...@gmail.com
> To: dev@ignite.apache.org
> 
> Andrey,
> 
> If I leave behind my knowledge about Ignite internals, my expectation would
> be that an EntryProcessor is invoked on all affinity - both primary and
> backup - nodes in the grid. The main reason behind this expectation is that
> usually a serialized EntryProcessor instance is smaller than resulting
> object being stored in the cache, so sending a serialized EntryProcessor
> should be cheaper. Is there a specific reason you expect an EntryProcessor
> to be called only once across all the nodes?
> 
> I would not imply any restrictions on how many times an EntryProcessor is
> called during a cache update. For example, in a case of explicit optimistic
> READ_COMMITTED transaction it may be called more than once because Ignite
> needs to calculate a return value for the first invoke() and then it should
> be called second time during commit when transactional locks are held.
> 
> Current requirement is that an EntryProcessor should be a stateless
> function, and it may be called more than once (but of course it will
> receive the same cache value every time). I agree that this should be
> properly articulated in the documentation, I will make sure that it will be
> reflected in the forthcoming 1.5 release javadocs.

RE: EntryProcessor execution semantics

Reply via email to