On 21 Jan 2014, at 17:45, Mircea Markus <mmar...@redhat.com> wrote:

> 
> On Jan 21, 2014, at 2:13 PM, Sanne Grinovero <sa...@infinispan.org> wrote:
> 
>> On 21 January 2014 13:37, Mircea Markus <mmar...@redhat.com> wrote:
>>> 
>>> On Jan 21, 2014, at 1:21 PM, Galder Zamarreño <gal...@redhat.com> wrote:
>>> 
>>>>> What's the point for these tests?
>>>> 
>>>> +1
>>> 
>>> To validate if storing the data in binary format yields better performance 
>>> than store is as a POJO.
>> 
>> That will highly depend on the scenarios you want to test for. AFAIK
>> this started after Paul described how session replication works in
>> WildFly, and we already know that both strategies are suboptimal with
>> the current options available: in his case the active node will always
>> write on the POJO, while the backup node will essentially only need to
>> store the buffer "just in case" he might need to take over.
> 
> Indeed as it is today, it doesn't make sense for WildFly's session 
> replication.
> 
>> 
>> Sure, one will be slower, but if you want to make a suggestion to him
>> about which configuration he should be using, we should measure his
>> use case, not a different one.
>> 
>> Even then as discussed in Palma, an in memory String representation
>> might be way more compact because of pooling of strings and a very
>> high likelihood for repeated headers (as common in web frameworks),
> 
> pooling like in String.intern()? 
> Even so, if most of your access to the String is to serialize it and sent is 
> remotely then you have a serialization cost(CPU) to pay for the reduced size.

Serialization has a cost, but nothing compared with the transport itself, and 
you don’t have to go very far to see the impact of transport. Just recently we 
were chasing some performance regression and even though there were some 
changes in serialization, the impact of my improvements was minimal, max 2-3%. 
Optimal network and transport configuration is more important IMO, and once 
again, misconfiguration in that layer is what was causing us to be ~20% slower.

> 
>> so
>> you might want to measure the CPU vs storage cost on the receiving
>> side.. but then again your results will definitely depend on the input
>> data and assumptions on likelihood of failover, how often is being
>> written on the owner node vs on the other node (since he uses
>> locality), etc.. many factors I'm not seeing being considered here and
>> which could make a significant difference.
> 
> I'm looking for the default setting of storeAsBinary in the configurations we 
> ship. I think the default configs should be optimized for distribution, 
> random key access (every reads/writes for any key executes on every node of 
> the cluster with the same probability) for both read an write.

I’m with Sanne on this. I still think this is not a useful exercise really, 
since serialization is not huge cost in total time spent. Our latency is driven 
by waiting for others to reply to our requests, and that’s the driver on sync 
mode. In async, you can forget about the serialization cost if you use 
putAsync(). 

I find it way more useful to look at Infinispan all the time and consider what 
things we should be ditching to make our configuration smaller, our memory 
consumption smaller, and a smaller code base.

> 
>> 
>>> As of now, it doesn't so I need to check why.
>> 
>> You could play with the test parameters until it produces an output
>> you like better, but I still see no point?
> 
> the point is to provide the best defaults params for the default config, and 
> see what's the usefulness of storeAsBinary.  
> 
>> This is not a realistic
>> scenario, at best it could help us document suggestions about which
>> scenarios you'd want to keep the option enabled vs disabled, but then
>> again I think we're wasting time as we could implement a better
>> strategy for Paul's use case: one which never deserializes a value
>> received from a remote node until it's been requested as a POJO, but
>> keeps the POJO as-is when it's stored locally.
> 
> I disagree: Paul's scenario, whilst very important, is quite specific. For 
> what I consider the general case (random key access, see above), your 
> approach is suboptimal.  
> 
> 
>> I believe that would
>> make sense also for OGM and probably most other users of Embedded.
>> Basically, that would re-implement something similar to the previous
>> design but simplifying it a bit so that it doesn't allow for a
>> back-and-forth conversion between storage types but rather dynamically
>> favors a specific storage strategy.
> 
> It all boils down to what we want to optimize for: random key access or some 
> degree of affinity. I think the former is the default.
> One way or the other, from the test Radim ran with random key access, the 
> storeAsBinary doesn't bring any benefit and it should: 
> http://lists.jboss.org/pipermail/infinispan-dev/2009-October/004299.html
> 
>> 
>> Cheers,
>> Sanne
>> 
>>> 
>>> Cheers,
>>> --
>>> Mircea Markus
>>> Infinispan lead (www.infinispan.org)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev@lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> Cheers,
> -- 
> Mircea Markus
> Infinispan lead (www.infinispan.org)
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarreño
gal...@redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to