> On May 6, 2015, 12:10 a.m., Yan Fang wrote:
> > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala, 
> > lines 481-486
> > <https://reviews.apache.org/r/33453/diff/4/?file=950515#file950515line481>
> >
> >     1. why does the changeLogSystemStreamPartition matter here?
> >     2. prefer the storePartitionDir, because it's the dir for each 
> > partition/task, right?
> 
> Navina Ramesh wrote:
>     1. why does the changeLogSystemStreamPartition matter here?
>     >> changeLogSystemStreamPartition indicates whether a store has a 
> changelog or not. Depending on the case, the base directory of the store will 
> be different.
>     
>     2. prefer the storePartitionDir, because it's the dir for each 
> partition/task, right?
>     >> Base directory is same for all tasks in the job. There is a separate 
> partition directory for each task within each store directory. The path looks 
> like : "<StoreBaseDir>/<StoreName>/<TaskName>/<storeFiles>" . Does that 
> clarify your question?
> 
> Yan Fang wrote:
>     1. "changeLogSystemStreamPartition indicates whether a store has a 
> changelog or not. Depending on the case, the base directory of the store will 
> be different."
>     It sounds a little confusing to me. Don't all stores have changelog? Do 
> you mean offset here?
>     
>     2.  for "TaskStorageManager.getStorePartitionDir(loggedStorageBaseDir, 
> storeName, taskName)", if two jobs have the same storeName and taskName, will 
> they be in the same directory?

It sounds a little confusing to me. Don't all stores have changelog? Do you 
mean offset here?
> Not all stores have changelog. You have to configure a store to have a 
> changelog using "stores.<storename>.changelog" property. It will be expensive 
> to associate a changelog with all stores when a job doesn't care about losing 
> its state. 

for "TaskStorageManager.getStorePartitionDir(loggedStorageBaseDir, storeName, 
taskName)", if two jobs have the same storeName and taskName, will they be in 
the same directory?
> No. The storeBaseDir is specific to the job. To clarify, the path above looks 
> more like: "<workingDir>/$jobName-$jobId/$storeName/$taskName/<storeFiles> .


- Navina


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33453/#review82610
-----------------------------------------------------------


On May 6, 2015, 6:22 a.m., Navina Ramesh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33453/
> -----------------------------------------------------------
> 
> (Updated May 6, 2015, 6:22 a.m.)
> 
> 
> Review request for samza, Yan Fang, Chris Riccomini, Naveen Somasundaram, and 
> Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Added checksum to the Offset file and some unit tests
> 
> Added Unit Tests for TaskStorageManager and refactored some code
> 
> Changed default to yarn cwd instead of io.tmpDir and refactored code
> 
> 
> Diffs
> -----
> 
>   samza-core/src/main/scala/org/apache/samza/config/ShellCommandConfig.scala 
> e94a4735217f59d074510ce1556c8c439e6a72f0 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> ac4793afe1e6868933e750181bee1e27c157b5e6 
>   samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
> f68a7fee24614fce101e91c4f933d9b4e65dda0a 
>   samza-core/src/main/scala/org/apache/samza/util/Util.scala 
> 8a83566ae6139127d7fe04ab42231151227dc479 
>   
> samza-core/src/test/scala/org/apache/samza/storage/TestTaskStorageManager.scala
>  PRE-CREATION 
>   samza-core/src/test/scala/org/apache/samza/util/TestUtil.scala 
> b75f44060fb8e660e824eaeb9cfdcc9d6fa902e8 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
>  1b44a517129b35affac802929087eaa0061e6b5d 
> 
> Diff: https://reviews.apache.org/r/33453/diff/
> 
> 
> Testing
> -------
> 
> Tested locally using hello-samza.
> Note: you have to set an environment variable LOGGED_STORE_BASE_DIR pointing 
> to the new location to persist the changelog attached stores. Otherwise, it 
> will default to YARN's cwd and will not re-use local state.
> 
> 
> Thanks,
> 
> Navina Ramesh
> 
>

Reply via email to