maytasm opened a new issue #9763:
URL: https://github.com/apache/druid/issues/9763


   
   ### Affected Version
   0.18.0
   
   ### Description
   
   There is no problem when index task is running and polling from 
kafka/kinesis stream with one or more empty shards (as tested in 
KafkaIndexTaskTest.java and KinesisIndexTaskTest.java). The problem for Kinesis 
described in the tittle is when we try to get the sequence number in 
SeekableStreamSupervisor#getOffsetFromStorageForPartition and Kinesis has one 
or more empty shard (as tested in KinesisRecordSupplierTest.java and 
SeekableStreamSupervisorStateTest.java). More specifically, this happens for 
the following conditions:
   
   - we don't have a startingOffset (first run or we had some previous failures 
and reset the sequences) and don't have offset in metadata store so we retrieve 
the latest or earliest Kinesis sequence
   
   - we don't have a startingOffset (first run or we had some previous failures 
and reset the sequences) and we have offset in metadata store but 
skipSequenceNumberAvailabilityCheck=False
   
   Currently, in SeekableStreamSupervisor#getOffsetFromStorageForPartition, 
after we use a ShardIterator to get some records, you get back a new iterator 
to continue reading where you left off. The thing is, it doesn't matter whether 
or not you've already reached the end of the stream, you'll still get back a 
valid ShardIterator. As long as the shard is open, any call to GetRecords with 
a valid (unexpired) ShardIterator will provide a valid non-null 
NextShardIterator. Hence, we keep getting new ShardIterator until we timeout 
and then throw an ISE exception which then resulted in Kinesis supervisor 
showing unhealthy. 
   
   We should determine when Kinesis shard is empty and not rely on timeout. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to