Tory McKeag created AVRO-2300:
---------------------------------

             Summary: Enhance encoder to track the total number of bytes written
                 Key: AVRO-2300
                 URL: https://issues.apache.org/jira/browse/AVRO-2300
             Project: Apache Avro
          Issue Type: Improvement
          Components: c++
            Reporter: Tory McKeag


I'd like to enhance the Encoder API so that it can track and report the number 
of bytes actually written out since init() has been called.  I'll explain my 
use case below:

I'm using the Avro C++ library to publish messages to Kafka using librdkafka 
([https://github.com/edenhill/librdkafka]).  I did an initial implementation 
using MemoryOutputStream (via avro::memoryOutputStream() of course).  After 
some tuning I ended up creating a couple custom implementations of 
avro::OutputStream to improve performance, but like the built-in 
MemoryOutputStream they all suffer from the same limitation:

I send a buffer to the Kafka API to be published, but I have to tell Kafka the 
*whole* length of the buffer, because I don't have a way to track the number of 
bytes that Avro actually wrote.  For example, given a chunk size of 50, if Avro 
serialized 80 bytes of data, then the buffer will be of size 100.  Since that's 
the size I get, I tell librdkafka to publish 100 bytes.  The system works, but 
we have to pay for I/O and storage of publishing 20 bytes of garbage.  It 
doesn't seem like a lot, but as we examine message throughput at our higher 
volumes it is significant.  

I would want to tell librdkafka to only publish 80 bytes in this example, but 
to do so I would have to have a way to determine how many bytes Avro actually 
wrote out.  My first guess as a user is that this should be available through 
the Encoder, because it would have to be part of the API, although looking at 
the code it seems to me like the state already exists in StreamWriter and would 
just need to be exposed.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to