mynameborat commented on a change in pull request #1429:
URL: https://github.com/apache/samza/pull/1429#discussion_r489513707
##########
File path:
samza-core/src/main/scala/org/apache/samza/storage/TaskStorageBackupManager.scala
##########
@@ -22,11 +22,31 @@ package org.apache.samza.storage
import org.apache.samza.checkpoint.CheckpointId
import org.apache.samza.system.SystemStreamPartition
-trait TaskStorageManager {
+trait TaskStorageBackupManager {
Review comment:
I looked a bit more and feel this interface is currently overloaded with
responsibilities. For the most part it does only commit handling for stores
related to a task and most of the management moved to `ContainerStorageManager`
but exposing the store through `getStore` lingers around.
> If you can take a crack at removing `getStore` method off this interface
that will good. `TaskInstance` can get store handles from CSM instead or just
directly gets the list of stores associated with it and eliminates CSM from the
picture are few options to consider.
The above is not a blocker but do create a follow up if you aren't planning
to do.
Given that context, I'd suggest to rename this interface to
`TaskCommitManager` and not roll the backup responsibilities into it rather
keep it separate and use composition within the commit manager. By this, you
will potentially have flexibility to reuse some of the shared commit logic
across the types (kafka, remote store, etc) but still keep the back/upload part
isolated to its own implementation without have a single class that manages
both these.
Let me know your thoughts?
##########
File path:
samza-core/src/main/scala/org/apache/samza/storage/KafkaTransactionalStateTaskStorageBackupManager.scala
##########
@@ -39,24 +39,28 @@ import scala.collection.JavaConverters._
/**
* Manage all the storage engines for a given task
*/
-class TransactionalStateTaskStorageManager(
+class KafkaTransactionalStateTaskStorageBackupManager(
Review comment:
Do you foresee any shared logic in the commit path regardless of the
choice of durability? Wonder if we should keep this as is and extract the
remote store specifics out to the backup manager.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]