> On May 6, 2015, 12:10 a.m., Yan Fang wrote:
> > samza-core/src/main/scala/org/apache/samza/config/ShellCommandConfig.scala, 
> > line 55
> > <https://reviews.apache.org/r/33453/diff/4/?file=950514#file950514line55>
> >
> >     1.is it also possible to retrived this value from the config? Then 
> > users do not need to set it in every node.
> >     2.out of curious, why is it "logged" and "non-logged", not something 
> > like, "persisting" and "non-persisting"?

1.is it also possible to retrived this value from the config? Then users do not 
need to set it in every node.
>> Essentially, we don't want this to be configured at a "per-job" level. If we 
>> do so, then we cannot clean-up after the stale data stores (ie. data stores 
>> that are not longer being used by any task). From my understanding, the only 
>> way to set a configuration at the cluster level is through environment 
>> variables. If there is a better alternative, please let me know. 

2.out of curious, why is it "logged" and "non-logged", not something like, 
"persisting" and "non-persisting"?
>> Not sure. We created a "LoggedStore" and not a "PersistantStore". That is 
>> probably why the terminology has stuck around.


> On May 6, 2015, 12:10 a.m., Yan Fang wrote:
> > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala, 
> > lines 447-464
> > <https://reviews.apache.org/r/33453/diff/4/?file=950515#file950515line447>
> >
> >     1. why do we need defaultStoreBaseDire and LoggedStorageBaseDir? Can we 
> > just have one variable and one method, such that, storeBaseDir = 
> > getStoreBaseDir?
> >     
> >     2. line 442 and line 454 are duplicated
> >     
> >     3. from line 447-448, just call Util.getJobNameAndId to avoid 
> > duplicated code?

1. why do we need defaultStoreBaseDire and LoggedStorageBaseDir? Can we just 
have one variable and one method, such that, storeBaseDir = getStoreBaseDir?
>> In the proposed design (SAMZA-617), we want to periodically clean-up the 
>> data stores that are not being used. For stores that don't have to be 
>> persisted, it can be cleaned up as soon as the job exits. This is already 
>> done for us by YARN. If we choose to use the same location for both the 
>> stores, then we have to clean up after the job exits and we lose our ability 
>> to use YARN NM's deletionService. This just makes sure that we persist only 
>> the data stores that we need. 

2. line 442 and line 454 are duplicated
>> For some reason, I wanted 2 different file object instances. I can't 
>> recollect why. I guess it can be avoided. 


3. from line 447-448, just call Util.getJobNameAndId to avoid duplicated code?
>> Sure. That sounds good.


> On May 6, 2015, 12:10 a.m., Yan Fang wrote:
> > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala, 
> > lines 481-486
> > <https://reviews.apache.org/r/33453/diff/4/?file=950515#file950515line481>
> >
> >     1. why does the changeLogSystemStreamPartition matter here?
> >     2. prefer the storePartitionDir, because it's the dir for each 
> > partition/task, right?

1. why does the changeLogSystemStreamPartition matter here?
>> changeLogSystemStreamPartition indicates whether a store has a changelog or 
>> not. Depending on the case, the base directory of the store will be 
>> different.

2. prefer the storePartitionDir, because it's the dir for each partition/task, 
right?
>> Base directory is same for all tasks in the job. There is a separate 
>> partition directory for each task within each store directory. The path 
>> looks like : "<StoreBaseDir>/<StoreName>/<TaskName>/<storeFiles>" . Does 
>> that clarify your question?


> On May 6, 2015, 12:10 a.m., Yan Fang wrote:
> > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala, 
> > lines 508-509
> > <https://reviews.apache.org/r/33453/diff/4/?file=950515#file950515line508>
> >
> >     why do we need 2 storeDir? Is it because we want to delete the default 
> > state Dir?

Yes


> On May 6, 2015, 12:10 a.m., Yan Fang wrote:
> > samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala,
> >  lines 110-112
> > <https://reviews.apache.org/r/33453/diff/4/?file=950516#file950516line110>
> >
> >     personal opinion: should this logic be another method, such as 
> > readOffsetFromFile, not "cleanBaseDir"? A little misleading.

I can modularize it further if it helps. Since we need to load offsets (if 
available) before cleaning directories, it became a part of it. 
Let me fix this.


- Navina


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33453/#review82610
-----------------------------------------------------------


On May 5, 2015, 6:47 p.m., Navina Ramesh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33453/
> -----------------------------------------------------------
> 
> (Updated May 5, 2015, 6:47 p.m.)
> 
> 
> Review request for samza, Yan Fang, Chris Riccomini, Naveen Somasundaram, and 
> Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Added checksum to the Offset file and some unit tests
> 
> Added Unit Tests for TaskStorageManager and refactored some code
> 
> Changed default to yarn cwd instead of io.tmpDir and refactored code
> 
> 
> Diffs
> -----
> 
>   samza-core/src/main/scala/org/apache/samza/config/ShellCommandConfig.scala 
> e94a4735217f59d074510ce1556c8c439e6a72f0 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> ac4793afe1e6868933e750181bee1e27c157b5e6 
>   samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
> f68a7fee24614fce101e91c4f933d9b4e65dda0a 
>   samza-core/src/main/scala/org/apache/samza/util/Util.scala 
> 8a83566ae6139127d7fe04ab42231151227dc479 
>   
> samza-core/src/test/scala/org/apache/samza/storage/TestTaskStorageManager.scala
>  PRE-CREATION 
>   samza-core/src/test/scala/org/apache/samza/util/TestUtil.scala 
> b75f44060fb8e660e824eaeb9cfdcc9d6fa902e8 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
>  1b44a517129b35affac802929087eaa0061e6b5d 
> 
> Diff: https://reviews.apache.org/r/33453/diff/
> 
> 
> Testing
> -------
> 
> Tested locally using hello-samza.
> Note: you have to set an environment variable LOGGED_STORE_BASE_DIR pointing 
> to the new location to persist the changelog attached stores. Otherwise, it 
> will default to YARN's cwd and will not re-use local state.
> 
> 
> Thanks,
> 
> Navina Ramesh
> 
>

Reply via email to