maytasm opened a new issue #9763: URL: https://github.com/apache/druid/issues/9763
### Affected Version 0.18.0 ### Description There is no problem when index task is running and polling from kafka/kinesis stream with one or more empty shards (as tested in KafkaIndexTaskTest.java and KinesisIndexTaskTest.java). The problem for Kinesis described in the tittle is when we try to get the sequence number in SeekableStreamSupervisor#getOffsetFromStorageForPartition and Kinesis has one or more empty shard (as tested in KinesisRecordSupplierTest.java and SeekableStreamSupervisorStateTest.java). More specifically, this happens for the following conditions: - we don't have a startingOffset (first run or we had some previous failures and reset the sequences) and don't have offset in metadata store so we retrieve the latest or earliest Kinesis sequence - we don't have a startingOffset (first run or we had some previous failures and reset the sequences) and we have offset in metadata store but skipSequenceNumberAvailabilityCheck=False Currently, in SeekableStreamSupervisor#getOffsetFromStorageForPartition, after we use a ShardIterator to get some records, you get back a new iterator to continue reading where you left off. The thing is, it doesn't matter whether or not you've already reached the end of the stream, you'll still get back a valid ShardIterator. As long as the shard is open, any call to GetRecords with a valid (unexpired) ShardIterator will provide a valid non-null NextShardIterator. Hence, we keep getting new ShardIterator until we timeout and then throw an ISE exception which then resulted in Kinesis supervisor showing unhealthy. We should determine when Kinesis shard is empty and not rely on timeout. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
