codelipenghui commented on issue #11962:
URL: https://github.com/apache/pulsar/issues/11962#issuecomment-915208660


   @BewareMyPower 
   > Once the converter was configured, before dispatching messages to Pulsar 
consumers, the converter would check if the buffer needs to be converted and 
then perform the conversion if necessary. 
   
   We should avoid converting(serialization and deserialization) the data at 
the broker side, this will put a very heavy burden on the broker GC. In my 
opinion, we should do the data converting at the client-side, we can have a 
diverse data format implementation and by default, the Pulsar client only has 
the Pulsar data format processor. if users want to consume the data with Kafka 
data format, they can add a separate dependency such as 
`org.apache.pulsar:kafka-format-converter`.
   
   The data to the Kafka client, should not be considered by the Pulsar broker, 
the KoP should handle it. If the data with Kafka format, KoP can send it 
directly to the Kafka consumer, if the data with Pulsar format, KoP needs to 
convert the data to the Kafka format.
   
   For the storage layer(BookKeeper and tiered storage), the data might be read 
directly bypass the broker such as PulsarSQL, Flink(In the future). This is 
also be considered if we are doing the data conversion at the broker side, we 
might need another implementation to read the data with multiple data formats 
from the BookKeeper/Tiered Storage. 
   
   @wangjialing218 
    > Just a idea. Current there is one ManagedLedger(ledger) associated with 
PersistentTopic. Could we add another ManagedLedger(kafkaLedger) associated 
with the topic.
   
   We can't use multiple managed ledgers for a topic, this will break the FIFO 
semantics if you have Kafka producers and Pulsar producers publishing data to 
the topic and Kafka consumers, Pulsar consumers to consume the data from the 
topic.
   And to read data repeatedly, we need to ensure the same reading order.
   
   @eolivelli 
   > do we need do add a Converter on the Publish side ?
   
   If the KoP wants to convert the data at the publishing path, KoP(Or other 
protocol handlers) can implement directly, any reason introduces the converter 
at the broker for the data publishing? And I think if using kafka format for 
KoP, KoP will convert the data on the publish side for now, this is an 
inefficient way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to