bschofield opened a new issue #435:
URL: https://github.com/apache/pulsar-client-go/issues/435


   #### Expected & actual behavior
   
   Somehow, I ended up with a corrupted message (or message batch?) on my 
production pulsar cluster. I'm unsure of the source of the corruption: it may 
have been generated by the pulsar CGo client which I was using, or it may have 
been generated elsewhere.
   
   When using the CGo client, the corruption manifested as consumers reading 
from the bad topic silently hanging, and subsequently being disconnected from 
the broker. Since the CGo client is now unsupported, I bit the bullet and moved 
over to the pure golang version. (Massive kudos to you all on keeping the 
interfaces so similar, by the way.)
   
   Following the move to this client, the pure-go consumers began crashing with 
the following trace:
   
   ```
   panic: runtime error: slice bounds out of range [:1890492169] with capacity 
324
   
   goroutine 187 [running]:
   
github.com/apache/pulsar-client-go/pulsar/internal.(*buffer).Read(0xc000983340, 
0xc070ae9f05, 0x14685e0, 0xe80560, 0xc000aac000)
        
/home/ben/pkg/mod/github.com/apache/[email protected]/pulsar/internal/buffer.go:113
 +0x6b
   
github.com/apache/pulsar-client-go/pulsar/internal.(*MessageReader).readSingleMessage(0xc000187cd0,
 0x144, 0x0, 0x0, 0x2b, 0xc000aac000, 0x0)
        
/home/ben/pkg/mod/github.com/apache/[email protected]/pulsar/internal/commands.go:145
 +0x77
   
github.com/apache/pulsar-client-go/pulsar/internal.(*MessageReader).ReadMessage(0xc000187cd0,
 0xc000136008, 0xc000187cb0, 0x1, 0x1, 0x2b, 0x0)
        
/home/ben/pkg/mod/github.com/apache/[email protected]/pulsar/internal/commands.go:129
 +0x5a
   
github.com/apache/pulsar-client-go/pulsar.(*partitionConsumer).MessageReceived(0xc000326840,
 0xc001fd2e80, 0xfd4080, 0xc000982100, 0xa4f001, 0xc0001daf70)
        
/home/ben/pkg/mod/github.com/apache/[email protected]/pulsar/consumer_partition.go:494
 +0x31a
   
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).handleMessage(0xc0001daf00,
 0xc001fd2e80, 0xfd4080, 0xc000982100)
        
/home/ben/pkg/mod/github.com/apache/[email protected]/pulsar/internal/connection.go:658
 +0x115
   
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).internalReceivedCommand(0xc0001daf00,
 0xc0002701c0, 0xfd4080, 0xc000982100)
        
/home/ben/pkg/mod/github.com/apache/[email protected]/pulsar/internal/connection.go:547
 +0x27c
   
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).run(0xc0001daf00)
        
/home/ben/pkg/mod/github.com/apache/[email protected]/pulsar/internal/connection.go:401
 +0x365
   
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).start.func1(0xc0001daf00)
        
/home/ben/pkg/mod/github.com/apache/[email protected]/pulsar/internal/connection.go:235
 +0x72
   created by 
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).start
        
/home/ben/pkg/mod/github.com/apache/[email protected]/pulsar/internal/connection.go:231
 +0x3f
   ```
   
   #### Steps to reproduce
   
   Unfortunately, I don't think this is reproducible. I needed to keep the 
cluster up, so I added some debug statements which identified the bad topic and 
partition, then cleared the backlog.
   
   I'm reporting the bug for two reasons. Firstly, so that anyone who 
encounters the same issue in the future finds this and can add more info. 
Secondly, in case you wish to add some more logic for detecting corrupted 
messages, so that this issue is detected on the client side without a crash.
   
   #### System configuration
   
   Pulsar broker: 2.6.1
   pulsar-client-go: 0.3.0
   
   Same issue observed in Ubuntu and Alpine.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to