[ 
https://issues.apache.org/jira/browse/AVRO-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvalluvan M. G. resolved AVRO-2300.
---------------------------------------
    Resolution: Fixed

Pull Request merged

> Enhance encoder to track the total number of bytes written
> ----------------------------------------------------------
>
>                 Key: AVRO-2300
>                 URL: https://issues.apache.org/jira/browse/AVRO-2300
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: c++
>            Reporter: Tory McKeag
>            Priority: Major
>
> I'd like to enhance the Encoder API so that it can track and report the 
> number of bytes actually written out since init() has been called.  I'll 
> explain my use case below:
> I'm using the Avro C++ library to publish messages to Kafka using librdkafka 
> ([https://github.com/edenhill/librdkafka]).  I did an initial implementation 
> using MemoryOutputStream (via avro::memoryOutputStream() of course).  After 
> some tuning I ended up creating a couple custom implementations of 
> avro::OutputStream to improve performance, but like the built-in 
> MemoryOutputStream they all suffer from the same limitation:
> I send a buffer to the Kafka API to be published, but I have to tell Kafka 
> the *whole* length of the buffer, because I don't have a way to track the 
> number of bytes that Avro actually wrote.  For example, given a chunk size of 
> 50, if Avro serialized 80 bytes of data, then the buffer will be of size 100. 
>  Since that's the size I get, I tell librdkafka to publish 100 bytes.  The 
> system works, but we have to pay for I/O and storage of publishing 20 bytes 
> of garbage.  It doesn't seem like a lot, but as we examine message throughput 
> at our higher volumes it is significant.  
> I would want to tell librdkafka to only publish 80 bytes in this example, but 
> to do so I would have to have a way to determine how many bytes Avro actually 
> wrote out.  My first guess as a user is that this should be available through 
> the Encoder, because it would have to be part of the API, although looking at 
> the code it seems to me like the state already exists in StreamWriter and 
> would just need to be exposed.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to