Starting on an integration project between a Kinesis stream and Samza, despite 
have no background in either, but I am familiar with most other 
messaging/queuing systems.


Decided to keep all state management within Samza instead of using Kinesis' 
client library. My plan was to use the default KafkaCheckpointManagerFactory on 
an timed interval basis, but I have a few questions.


What exactly is being checkpointed? What value can I retrieve to use as an 
offset for my Kinesis stream? Or is this something I need to keep track of in a 
store? If so, what is the point of checkpointing? Can I use RocksDb to save the 
Kinesis offset at every document (efficiently that is)?


Related to Kinesis and not quite Samza, it does not have a listener/push 
framework, but it requires constant polling (unless I missed something). First 
of all, I was going to have a partition for each Kinesis shard partition. But 
the next question is, should I simply have a while(true) polling method inside 
my consumer(BlockingEnvelopeMap)? Seems inefficient, even with a timeout. How 
can I get new data to instantiate a new consumer? My consumer will put a new 
document to my task.


Cheers.



Reply via email to