[ 
https://issues.apache.org/jira/browse/KAFKA-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Albert Strasheim updated KAFKA-1449:
------------------------------------

    Description: 
Howdy

We are currently building out a number of Kafka consumers in Go, based on a 
patched version of the Sarama library that Shopify released a while back.

We have a reasonably fast serialization protocol (Cap'n Proto), a 10G network 
and lots of cores. We have various consumers computing all kinds of aggregates 
on a reasonably high volume access log stream (1.1e6 messages/sec peak, about 
500-600 bytes per message uncompressed).

When profiling our consumer, our single hottest function (until we disabled 
it), was the CRC32 checksum validation, since the deserialization and 
aggregation in these consumers is pretty cheap.

We believe things could be improved by extending the wire protocol to support 
CRC-32C (Castagnoli), since SSE 4.2 has an instruction to accelerate its 
calculation.

https://en.wikipedia.org/wiki/SSE4#SSE4.2

It might be hard to use from Java, but consumers written in most other 
languages will benefit a lot.

To give you an idea, here are some benchmarks for the Go CRC32 functions 
running on a Intel(R) Core(TM) i7-3540M CPU @ 3.00GHz core:

BenchmarkCrc32KB         90196 ns/op 363.30 MB/s
BenchmarkCrcCastagnoli32KB 3404 ns/op 9624.42 MB/s

I believe BenchmarkCrc32 written in C would do about 600-700 MB/sec, and the 
CRC32-C speed should be close to what one achieves in Go.

(Met Todd and Clark at the meetup last night. Thanks for the great 
presentation!)

  was:
Howdy

We are currently building out a number of Kafka consumers in Go, based on a 
patched version of the Sarama library that Shopify released a while back.

We have a reasonably fast serialization protocol (Cap'n Proto), a 10G network 
and lots of cores. We have various consumers computing all kinds of aggregates 
on a reasonably high volume access log stream (1e6 messages/sec peak, about 
500-600 bytes per message uncompressed).

When profiling our consumer, our single hottest function (until we disabled 
it), was the CRC32 checksum validation, since the deserialization and 
aggregation in these consumers is pretty cheap.

We believe things could be improved by extending the wire protocol to support 
CRC-32C (Castagnoli), since SSE 4.2 has an instruction to accelerate its 
calculation.

https://en.wikipedia.org/wiki/SSE4#SSE4.2

It might be hard to use from Java, but consumers written in most other 
languages will benefit a lot.

To give you an idea, here are some benchmarks for the Go CRC32 functions 
running on a Intel(R) Core(TM) i7-3540M CPU @ 3.00GHz core:

BenchmarkCrc32KB         90196 ns/op 363.30 MB/s
BenchmarkCrcCastagnoli32KB 3404 ns/op 9624.42 MB/s

I believe BenchmarkCrc32 written in C would do about 600-700 MB/sec, and the 
CRC32-C speed should be close to what one achieves in Go.

(Met Todd and Clark at the meetup last night. Thanks for the great 
presentation!)


> Extend wire protocol to allow CRC32C
> ------------------------------------
>
>                 Key: KAFKA-1449
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1449
>             Project: Kafka
>          Issue Type: Improvement
>          Components: consumer
>            Reporter: Albert Strasheim
>            Assignee: Neha Narkhede
>             Fix For: 0.9.0
>
>
> Howdy
> We are currently building out a number of Kafka consumers in Go, based on a 
> patched version of the Sarama library that Shopify released a while back.
> We have a reasonably fast serialization protocol (Cap'n Proto), a 10G network 
> and lots of cores. We have various consumers computing all kinds of 
> aggregates on a reasonably high volume access log stream (1.1e6 messages/sec 
> peak, about 500-600 bytes per message uncompressed).
> When profiling our consumer, our single hottest function (until we disabled 
> it), was the CRC32 checksum validation, since the deserialization and 
> aggregation in these consumers is pretty cheap.
> We believe things could be improved by extending the wire protocol to support 
> CRC-32C (Castagnoli), since SSE 4.2 has an instruction to accelerate its 
> calculation.
> https://en.wikipedia.org/wiki/SSE4#SSE4.2
> It might be hard to use from Java, but consumers written in most other 
> languages will benefit a lot.
> To give you an idea, here are some benchmarks for the Go CRC32 functions 
> running on a Intel(R) Core(TM) i7-3540M CPU @ 3.00GHz core:
> BenchmarkCrc32KB       90196 ns/op 363.30 MB/s
> BenchmarkCrcCastagnoli32KB 3404 ns/op 9624.42 MB/s
> I believe BenchmarkCrc32 written in C would do about 600-700 MB/sec, and the 
> CRC32-C speed should be close to what one achieves in Go.
> (Met Todd and Clark at the meetup last night. Thanks for the great 
> presentation!)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to