mynameborat commented on pull request #1475: URL: https://github.com/apache/samza/pull/1475#issuecomment-799476540
I am not sure if the approach taken in the PR is the ideal way to go about achieving the changes you need. I see few problems with delegating `null` checks from `BaseKeyValueStorageEngineFactory` to the extenders. 1. In `BaseKeyValueStorageEngineFactory`, we do multiple layers of wrapping stores (logged store, access logged store, serialized store) and so on.. By removing the checks, you are breaking this invariant of non-null `Serde` to all of steps that follow construction of the higher layer store (RocksDb or InMemory etc). 2. Particularly, this is problematic for the construction of `SerializedKeyValueStore` and `AccessLoggedStore` because the former assumes both `KeySerde` and `ValueSerde` to be null and the latter assumes `KeySerde` to be non-null. 3. Forces high layer `StorageEngineFactory` to handle `KeySerde` and `ValueSerde` during construction even though it is not necessary. For e.g. `RocksDbKeyValueStorageEngineFactory` and `InMemoryKeyValueStorageEngineFactory` 4. Introduces potential problems with `Serde`; i.e. Higher layer storage engines mistakenly use the `Serde` themselves and also have wrapped store handles which potentially does `Serde` as well. It will be good if you can take into consideration the following as part of your approach - Do storage engines mandate over what kind of wrapping is allowed? e.g. logged store and/or serialized store etc - Do end users continue to describe properties of store that indirectly determines the wrapping applied? e.g. large message enabled store (vs cached store vs access logged store - Does the framework determine the combination of configurations that drive the wrapping strategy? Potential solutions I can think of 1. Introduce internal flag or have explicit store properties that drives how wrapping is accomplished and determines if a store is `SerializedKeyValueStore` or not 2. Instead of removing the null check, push it down within `BaseKeyValueStorageEngineFactory` and modify how wrapping is accomplished either based on storage engines. 3. If flexibility is the key, introduce capability for storage engines to implement that handles raw store without any these goodies and wrapping so it has all the control it needs. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
