[ 
https://issues.apache.org/jira/browse/KAFKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411821#comment-15411821
 ] 

Bill Bejeck edited comment on KAFKA-3973 at 8/8/16 3:34 PM:
------------------------------------------------------------

I used JMH to benchmark the performance of caching bytes vs object (tracking by 
memory size using jamm) here are the results:

EDIT: New results from updated test
# Run complete. Total time: 00:02:41

Benchmark                                                                       
Mode  Cnt        Score        Error         Units
MemoryBytesCacheBenchmark.testCacheByMemory     thrpt   40   536694.504 ±   
4177.019  ops/s
MemoryBytesCacheBenchmark.testCacheBySizeBytes  thrpt   40  4713360.286 ± 
60874.723  ops/s 


Using JMH it still appears that serialization has the advantage.  
The test used for benchmarking will be included in the PR for KAFKA-3989 
(coming soon).


was (Author: bbejeck):
I used JMH to benchmark the performance of caching bytes vs object (tracking by 
memory size using jamm) here are the results:


Result "testCacheBySizeBytes":
 2157013.372 ±(99.9%) 198793.816 ops/s [Average]
 (min, avg, max) = (687952.309, 2157013.372, 2485954.624), stdev = 353355.834
 CI (99.9%): [1958219.556, 2355807.189] (assumes normal distribution)


# Run complete. Total time: 00:02:41

Benchmark                                                                       
Mode  Cnt  Score                Error             Units
MemoryBytesCacheBenchmark.testCacheByMemory     thrpt   40    290142.181 ±   
3001.345      ops/s
MemoryBytesCacheBenchmark.testCacheBySizeBytes  thrpt   40  2157013.372 ±  
198793.816   ops/s

Using JMH it still appears that serialization has the advantage.  
The test used for benchmarking will be included in the PR for KAFKA-3989 
(coming soon).

> Investigate feasibility of caching bytes vs. records
> ----------------------------------------------------
>
>                 Key: KAFKA-3973
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3973
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: streams
>            Reporter: Eno Thereska
>            Assignee: Bill Bejeck
>             Fix For: 0.10.1.0
>
>         Attachments: CachingPerformanceBenchmarks.java, MemoryLRUCache.java
>
>
> Currently the cache stores and accounts for records, not bytes or objects. 
> This investigation would be around measuring any performance overheads that 
> come from storing bytes or objects. As an outcome we should know whether 1) 
> we should store bytes or 2) we should store objects. 
> If we store objects, the cache still needs to know their size (so that it can 
> know if the object fits in the allocated cache space, e.g., if the cache is 
> 100MB and the object is 10MB, we'd have space for 10 such objects). The 
> investigation needs to figure out how to find out the size of the object 
> efficiently in Java.
> If we store bytes, then we are serialising an object into bytes before 
> caching it, i.e., we take a serialisation cost. The investigation needs 
> measure how bad this cost can be especially for the case when all objects fit 
> in cache (and thus any extra serialisation cost would show).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to