[
https://issues.apache.org/jira/browse/AVRO-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thiruvalluvan M. G. resolved AVRO-2300.
---------------------------------------
Resolution: Fixed
Pull Request merged
> Enhance encoder to track the total number of bytes written
> ----------------------------------------------------------
>
> Key: AVRO-2300
> URL: https://issues.apache.org/jira/browse/AVRO-2300
> Project: Apache Avro
> Issue Type: Improvement
> Components: c++
> Reporter: Tory McKeag
> Priority: Major
>
> I'd like to enhance the Encoder API so that it can track and report the
> number of bytes actually written out since init() has been called. I'll
> explain my use case below:
> I'm using the Avro C++ library to publish messages to Kafka using librdkafka
> ([https://github.com/edenhill/librdkafka]). I did an initial implementation
> using MemoryOutputStream (via avro::memoryOutputStream() of course). After
> some tuning I ended up creating a couple custom implementations of
> avro::OutputStream to improve performance, but like the built-in
> MemoryOutputStream they all suffer from the same limitation:
> I send a buffer to the Kafka API to be published, but I have to tell Kafka
> the *whole* length of the buffer, because I don't have a way to track the
> number of bytes that Avro actually wrote. For example, given a chunk size of
> 50, if Avro serialized 80 bytes of data, then the buffer will be of size 100.
> Since that's the size I get, I tell librdkafka to publish 100 bytes. The
> system works, but we have to pay for I/O and storage of publishing 20 bytes
> of garbage. It doesn't seem like a lot, but as we examine message throughput
> at our higher volumes it is significant.
> I would want to tell librdkafka to only publish 80 bytes in this example, but
> to do so I would have to have a way to determine how many bytes Avro actually
> wrote out. My first guess as a user is that this should be available through
> the Encoder, because it would have to be part of the API, although looking at
> the code it seems to me like the state already exists in StreamWriter and
> would just need to be exposed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)