{entryId}) management API returns undocumented and inconsistent data when batch messages are accessed

via GitHub Mon, 08 May 2023 06:19:02 -0700


zbentley opened a new issue, #20258:
URL: https://github.com/apache/pulsar/issues/20258


   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Version
   
   2.10.3
   
   ### Minimal reproduce step
   
   1. Produce a message containing a bytes-schema payload `abc123` to a 
persistent topic with a *non-batched* (`batchingEnabled=false`) producer.
   2. Retrieve that message from the topic using the v2 admin API's `GET 
admin/v2/persistent/{tenant}/{namespace}/{topic}/ledger/{ledgerId}/entry/{entryId}`
 functionality.
   3. Observe that the returned payload exactly matches the string `abc123`.
   4. Produce the exact same message with `batchingEnabled=true`.
   5. Repeat step 2.
   6. Observe that the returned payload no longer matches `abc123`.
   
   ### What did you expect to see?
   
   Something I can use to extract individual messages from a batch. That could 
include any of the below ideas, or something totally different:
   - (Easiest) documentation on how to parse a batch of messages client-side 
such that I can extract an individual message. I checked `PulsarApi.proto` 
against the batch data and nothing added up, so I suspect this is something 
internal, but may be wrong.
   - A change to the API to support a parameter indicating batch message index, 
such that only messages with that index would be returned.
   - Variable length HTTP headers indicating where each message content lies 
(offset and length) within the returned batch blob.
   - A change to the API to return multipart HTTP responses, one per batch 
message.
   
   ### What did you see instead?
   
   An undocumented blob of binary (I think this is a raw chunk of message data 
from the ledger) that looks like it contains some info re: properties/etc. up 
front, and then concatenated message+metadata entries for each message in the 
batch.
   
   ### Anything else?
   
   Many of my proposed fixes break backwards compatibility, so this may be 
better suited as a feature request.
   
   However, in the short term, I'd love to find a  reference on how to extract 
individual messages from the batch in a non-Java environment. I control all my 
admin API accesses in my environment, so I can add parsing logic to those 
wrappers--I just need to know how to parse the data.
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [pulsar] zbentley opened a new issue, #20258: [Bug] Topic "peek" (admin/v2/persistent/{tenant}/{namespace}/{topic}/ledger/{ledgerId}/entry/{entryId}) management API returns undocumented and inconsistent data when batch messages are accessed

Reply via email to