[ 
https://issues.apache.org/jira/browse/KAFKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411821#comment-15411821
 ] 

Bill Bejeck edited comment on KAFKA-3973 at 8/8/16 4:55 PM:
------------------------------------------------------------

I used JMH to benchmark the performance of caching bytes vs object (tracking by 
memory size using jamm) here are the results:

EDIT: Needed to refactor tests, and use Bytes to wrap byte array for keys in 
cache

Run complete. Total time: 00:02:42

Benchmark                                                                       
Mode  Cnt        Score        Error            Units
MemoryBytesCacheBenchmark.testCacheByMemory     thrpt   40    251002.444  ± 
20683.129   ops/s
MemoryBytesCacheBenchmark.testCacheBySizeBytes  thrpt   40  1477170.674  ± 
12772.196   ops/s


After refactoring the JMH test the gap between tracking by memory and 
serialization has closed some, but serialization still has the advantage.  
The test used for benchmarking will be included in the PR for KAFKA-3989 
(coming soon).


was (Author: bbejeck):
I used JMH to benchmark the performance of caching bytes vs object (tracking by 
memory size using jamm) here are the results:

EDIT: Needed to refactor tests, and use Bytes to wrap byte array for keys in 
cache

Run complete. Total time: 00:02:42

Benchmark                                                                       
Mode  Cnt        Score        Error            Units
MemoryBytesCacheBenchmark.testCacheByMemory     thrpt   40    251002.444  ± 
20683.129   ops/s
MemoryBytesCacheBenchmark.testCacheBySizeBytes  thrpt   40  1477170.674  ± 
12772.196   ops/s


After refactoring the JMH test the gap between tracking by memory and 
serialization has close, but it still appears that serialization has the 
advantage.  
The test used for benchmarking will be included in the PR for KAFKA-3989 
(coming soon).

> Investigate feasibility of caching bytes vs. records
> ----------------------------------------------------
>
>                 Key: KAFKA-3973
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3973
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: streams
>            Reporter: Eno Thereska
>            Assignee: Bill Bejeck
>             Fix For: 0.10.1.0
>
>         Attachments: CachingPerformanceBenchmarks.java, MemoryLRUCache.java
>
>
> Currently the cache stores and accounts for records, not bytes or objects. 
> This investigation would be around measuring any performance overheads that 
> come from storing bytes or objects. As an outcome we should know whether 1) 
> we should store bytes or 2) we should store objects. 
> If we store objects, the cache still needs to know their size (so that it can 
> know if the object fits in the allocated cache space, e.g., if the cache is 
> 100MB and the object is 10MB, we'd have space for 10 such objects). The 
> investigation needs to figure out how to find out the size of the object 
> efficiently in Java.
> If we store bytes, then we are serialising an object into bytes before 
> caching it, i.e., we take a serialisation cost. The investigation needs 
> measure how bad this cost can be especially for the case when all objects fit 
> in cache (and thus any extra serialisation cost would show).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to