BewareMyPower commented on PR #19414:
URL: https://github.com/apache/pulsar/pull/19414#issuecomment-1504671904

   Pulsar Functions converts the `MessageId` to `MessageIdImpl` to get the 
ledger ID and entry ID:
   
   
https://github.com/apache/pulsar/blob/d1fc7323cbf61a6d2955486fc123fdde5253e72c/pulsar-functions/utils/src/main/java/org/apache/pulsar/functions/utils/FunctionCommon.java#L328-L330
   
   Kafka Connect Sink even uses reflection to get the batch index from 
`MessageId`:
   
   
https://github.com/apache/pulsar/blob/d1fc7323cbf61a6d2955486fc123fdde5253e72c/pulsar-io/kafka-connect-adaptor/src/main/java/org/apache/pulsar/io/kafka/connect/KafkaConnectSink.java#L404
   
   Pulsar Spark Connector casts `MessageId` to the implementations for these 
internal fields:
   
   
https://github.com/streamnative/pulsar-spark/blob/fa131552ab5e857f801b91a58f97b8a09bf3eecc/src/main/scala/org/apache/spark/sql/pulsar/PulsarSources.scala#L73
   
   The Spark case is extremely terrible, because it depends on many more 
internal methods like `TopicMessageIdImpl#getInnerMessageId`.
   
   ```scala
     def mid2Impl(mid: MessageId): MessageIdImpl = {
       mid match {
         case bmid: BatchMessageIdImpl =>
           new MessageIdImpl(bmid.getLedgerId, bmid.getEntryId, 
bmid.getPartitionIndex)
         case midi: MessageIdImpl => midi
         case t: TopicMessageIdImpl => mid2Impl(t.getInnerMessageId)
         case up: UserProvidedMessageId => mid2Impl(up.mid)
       }
     }
   ```
   
   Before considering **what if we replace bookkeeper with other some other 
storage?**, I'd like to ask:
   - What if the `getInnerMessageId` method was removed?
   - What if the `TopicMessageIdImpl` method was removed?
   
   I believe can find more cases the external applications want to access these 
internal fields. Even if we didn't encourage them to do that. The 
`MessageIdAdv` interface is not added to **encourage users to use internal 
data**. Instead, many users **already assume the storage is BK and want to use 
internal data** but they have to access unstable methods in the implementations 
module (`pulsar-client`). It brings the problem that each minor modification of 
the implementation could bring breaking changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to