Starting on an integration project between a Kinesis stream and Samza, despite have no background in either, but I am familiar with most other messaging/queuing systems.
Decided to keep all state management within Samza instead of using Kinesis' client library. My plan was to use the default KafkaCheckpointManagerFactory on an timed interval basis, but I have a few questions. What exactly is being checkpointed? What value can I retrieve to use as an offset for my Kinesis stream? Or is this something I need to keep track of in a store? If so, what is the point of checkpointing? Can I use RocksDb to save the Kinesis offset at every document (efficiently that is)? Related to Kinesis and not quite Samza, it does not have a listener/push framework, but it requires constant polling (unless I missed something). First of all, I was going to have a partition for each Kinesis shard partition. But the next question is, should I simply have a while(true) polling method inside my consumer(BlockingEnvelopeMap)? Seems inefficient, even with a timeout. How can I get new data to instantiate a new consumer? My consumer will put a new document to my task. Cheers.