Hi Emmanuel,

Apologies for the delay getting back on the topic. Replies below from the 
remote listeners POV...

On 18 Sep 2014, at 18:24, Emmanuel Bernard <emman...@hibernate.org> wrote:

> Hi all,
> 
> I have had a good exchange on how someone would use clustered / remote 
> listeners to do custom continuous query features.
> 
> I have a few questions and requests to make this fully and easily doable
> 
> ## Value as bytes or as objects
> 
> Assuming a Hot Rod based usage and protobuf as the serialization layer. What 
> are KeyValueFilter and Converter seeing?
> I assume today the bytes are unmarshalled and the Java object is provided to 
> these interfaces.

Yes. For protobuf serialization to be used, the client would need a custom 
Marshaller plugged that converts an Object into protobuf bytes, and this same 
marshaller should be plugged server side so that the server can do the opposite 
translation from protobuf bytes to Object. Plugging the server with a 
marshaller for the filter/converter is not yet there but would be once [1] is 
in place.

[1] https://issues.jboss.org/browse/ISPN-4734

> In a protobuf based storage, does that mean that the user must create the 
> Java objects out of a protobuf compiler and deploy these classes in the 
> classpath of each server node?

Yes, those classes could be part of the filter/converter/marshaller deployment 
jar.

> Alternatively, could we pass the raw protobuf data to the KeyValueFilter and 
> Converter? They could read the relevant properties at no deserialization cost 
> and with lss problems related to the classloader.

^ I don’t see why not. Bear in mind that filter/converter callbacks happen 
server side, but as long as implementations can make out what they need from 
those byte arrays, all good IMO. I’ll create a JIRA to track this. Not sure it 
could be done wo/ a configuration option but I’ll try to do so if possible.

> Thoughts?

So far so good :)

> ## Synced listeners
> 
> In a transactional clustered listener marked as sync. Does the transaction 
> commits and then waits for the relevant clustered listeners to proceed before 
> returning the hand to the Tx client? Or is there something else going on?
> 
> ## oldValue and newValue
> 
> I understand why the oldValue was not provided in the initial work. It 
> requires to send more data across the network and at least double the number 
> of values unmarshalled.

Yes, but to clarify, the cost is on the clustered listener side to ship the old 
values to the node where the clustered listener runs, which in turn feeds to 
the cluster listener delegate and server-side remote filter/converter.

> But for continuous queries, being able to compare the old and the new value 
> is critical to reduce the number of events sent to the listener.
> 
> Imagine the following use case. A listener exposes the average age for a 
> certain type of customer. You would implement it the following way.
> 
> 1. Add a KeyValueFilter that
> - upon creation, filter out the customers of the wrong type
> - upon update, keep customers that 
>   - *were* of the right time but no longer are
>   - were not of the right type but now now *are*
>   - remains of the right type and whose age has changed
> - upon deletion, keep customers that *were* of the right type
> 
> 2. Converter
> In the converter, one could send the whole customer but it would be more 
> efficient to only send the age of the customer as well as wether it is added 
> to or removed from the matching customers
> - upon creation, you send the customer age and mark it as addition
> - upon deletion, you send the customer age and mark it as deletion
> - upon update
>   - if the customer was of the right type but no longer is, send the age as 
> well as a deletion flag
>   - if the customer was not of the right type but now is, send the age as 
> well as an addition flag
>  -  if the customer age has changed, send the difference with a modification 
> flag
> 
> 3. The listener then needs to keep the total sum of all ages as well as the 
> total number of customers of the right type. Based on the sent events, it can 
> adjust these two counters.
> 
> That requires us to be able to provide the old and new value to the 
> KeyValueFilter and the Converter interface as well as the type of event 
> (creation, update, deletion).
> 
> If you keep the existing interfaces and their data, the data send and the 
> memory consumed becomes much much bigger. I leave it as an exercise but I 
> think you need to:
> - send *all* remove and update events regardless of the value (essentially no 
> KeyValueFilter)
> - in the listener, keep a list of *all* matching keys so that you know if a 
> new event is about a data that was already matching your criteria or not and 
> act accordingly.

Yup…, that’s kinda the workaround I suggested Radim in an earlier email.

> 
> BTW, you need the old and new value even if your listener returns actual 
> matching results instead of an aggregation. More or less for the same reasons.
> 
> Continuous query is about the most important use case for remote and 
> clustered listeners and I think we should address it properly and as 
> efficiently as possible. Adding continuous query to Infinispan will then 
> “simply” be a matter of agreeing on the query syntax and implement the 
> predicates as smartly as possible.
> 
> With the use case I describe, I think the best approach is to merge the KVF 
> and Converter into a single Listener like interface that is able to send or 
> silence an event payload. But that’s guestimate.
> Because oldValue / newValue implies an unmarshalling overhead we might want 
> to make it an annotation based flag on the class that is executed on each 
> node (somewhat similar to the settings hosted on @Listener).

The majority of work here falls on the clustered listener side, to be send the 
old value when it needs to do it. From a remote eventing perspective, there’s 
little to be done other than bridge over to what cluster listener provides us.

> ## includeCurrentState and very narrow filtering
> 
> The existing approach is fine (send a create event notif for all existing 
> keys and queue changes in the mean time) as long as the listener plans to 
> consume most of these events.
> But in case of a big data grid, with a lot of passivated entries, the cost 
> would become non negligible.
> 
> An alternative approach is to first do a query matching the elements the 
> listener is interested in and queue up the events until the query is fully 
> processed. Can a listener access a cache and do a query? Should we offer such 
> option in a more packaged way?
> 
> For a listener that is only interested in keys whose value city contains 
> Springfield, Virginia, the gain would be massive.

That sounds like a good idea, though not sure how that would work from a remote 
query perspective (Adrian?). With the little knowledge I have on that, I’d 
imagine that the remote client could maybe pass an optional query of some sort 
when adding the listener, with this being bundle inside the addlistener HR 
operation, and then somehow have it plugged into clustered listeners.

> ## Remote listener and non Java HR clients
> 
> Does the API of non Java HR clients support the enlistements of listeners and 
> attach registered keyValueFilter / Converter? Or is that planned? Just 
> curious.

AFAIK, only the Java HR client has those implemented so far. If language 
experts want to help out on other with other impls, that’d be awesome :)

> 
> Emmanuel
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarreño
gal...@redhat.com
twitter.com/galderz


_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to