[ 
https://issues.apache.org/jira/browse/KAFKA-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Compton updated KAFKA-1516:
----------------------------------

    Description: 
The producer performance test in Kafka sends messages with either [0x0 
bytes|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L237]
 or messages with [all 
X's|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L225].
 This skews the compression ratio massively and probably affects performance in 
other ways.

We want to create messages which will give a more realistic performance 
profile. Using random bytes may not be the best solution as these won't 
compress at all and will skew compression times.

Perhaps using a template which injects random or sequential data into it could 
work. Or maybe I'm overthinking it and we should just go for random bytes. What 
other options do we have? Others seem to use random bytes like 
[cassandra-stress|https://github.com/zznate/cassandra-stress/blob/master/src/main/java/com/riptano/cassandra/stress/InsertCommand.java#L39]

  was:
The producer performance test in Kafka sends messages with either [0x0 
bytes|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L237]
 or messages with [all 
X's|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L225].
 This skews the compression ratio massively and probably affects performance in 
other ways.

We want to create messages which will give a more realistic performance 
profile. Using random bytes may not be the best solution as these won't 
compress at all and will skew compression times.

Perhaps using a template which injects random or sequential data into it could 
work. Or maybe I'm overthinking it and we should just go for random bytes.


> Producer Performance Test sends messages with bytes of 0x0
> ----------------------------------------------------------
>
>                 Key: KAFKA-1516
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1516
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1.1
>            Reporter: Daniel Compton
>            Priority: Minor
>
> The producer performance test in Kafka sends messages with either [0x0 
> bytes|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L237]
>  or messages with [all 
> X's|https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala#L225].
>  This skews the compression ratio massively and probably affects performance 
> in other ways.
> We want to create messages which will give a more realistic performance 
> profile. Using random bytes may not be the best solution as these won't 
> compress at all and will skew compression times.
> Perhaps using a template which injects random or sequential data into it 
> could work. Or maybe I'm overthinking it and we should just go for random 
> bytes. What other options do we have? Others seem to use random bytes like 
> [cassandra-stress|https://github.com/zznate/cassandra-stress/blob/master/src/main/java/com/riptano/cassandra/stress/InsertCommand.java#L39]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to