> On May 6, 2015, 12:10 a.m., Yan Fang wrote: > > samza-core/src/main/scala/org/apache/samza/config/ShellCommandConfig.scala, > > line 55 > > <https://reviews.apache.org/r/33453/diff/4/?file=950514#file950514line55> > > > > 1.is it also possible to retrived this value from the config? Then > > users do not need to set it in every node. > > 2.out of curious, why is it "logged" and "non-logged", not something > > like, "persisting" and "non-persisting"?
1.is it also possible to retrived this value from the config? Then users do not need to set it in every node. >> Essentially, we don't want this to be configured at a "per-job" level. If we >> do so, then we cannot clean-up after the stale data stores (ie. data stores >> that are not longer being used by any task). From my understanding, the only >> way to set a configuration at the cluster level is through environment >> variables. If there is a better alternative, please let me know. 2.out of curious, why is it "logged" and "non-logged", not something like, "persisting" and "non-persisting"? >> Not sure. We created a "LoggedStore" and not a "PersistantStore". That is >> probably why the terminology has stuck around. > On May 6, 2015, 12:10 a.m., Yan Fang wrote: > > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala, > > lines 447-464 > > <https://reviews.apache.org/r/33453/diff/4/?file=950515#file950515line447> > > > > 1. why do we need defaultStoreBaseDire and LoggedStorageBaseDir? Can we > > just have one variable and one method, such that, storeBaseDir = > > getStoreBaseDir? > > > > 2. line 442 and line 454 are duplicated > > > > 3. from line 447-448, just call Util.getJobNameAndId to avoid > > duplicated code? 1. why do we need defaultStoreBaseDire and LoggedStorageBaseDir? Can we just have one variable and one method, such that, storeBaseDir = getStoreBaseDir? >> In the proposed design (SAMZA-617), we want to periodically clean-up the >> data stores that are not being used. For stores that don't have to be >> persisted, it can be cleaned up as soon as the job exits. This is already >> done for us by YARN. If we choose to use the same location for both the >> stores, then we have to clean up after the job exits and we lose our ability >> to use YARN NM's deletionService. This just makes sure that we persist only >> the data stores that we need. 2. line 442 and line 454 are duplicated >> For some reason, I wanted 2 different file object instances. I can't >> recollect why. I guess it can be avoided. 3. from line 447-448, just call Util.getJobNameAndId to avoid duplicated code? >> Sure. That sounds good. > On May 6, 2015, 12:10 a.m., Yan Fang wrote: > > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala, > > lines 481-486 > > <https://reviews.apache.org/r/33453/diff/4/?file=950515#file950515line481> > > > > 1. why does the changeLogSystemStreamPartition matter here? > > 2. prefer the storePartitionDir, because it's the dir for each > > partition/task, right? 1. why does the changeLogSystemStreamPartition matter here? >> changeLogSystemStreamPartition indicates whether a store has a changelog or >> not. Depending on the case, the base directory of the store will be >> different. 2. prefer the storePartitionDir, because it's the dir for each partition/task, right? >> Base directory is same for all tasks in the job. There is a separate >> partition directory for each task within each store directory. The path >> looks like : "<StoreBaseDir>/<StoreName>/<TaskName>/<storeFiles>" . Does >> that clarify your question? > On May 6, 2015, 12:10 a.m., Yan Fang wrote: > > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala, > > lines 508-509 > > <https://reviews.apache.org/r/33453/diff/4/?file=950515#file950515line508> > > > > why do we need 2 storeDir? Is it because we want to delete the default > > state Dir? Yes > On May 6, 2015, 12:10 a.m., Yan Fang wrote: > > samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala, > > lines 110-112 > > <https://reviews.apache.org/r/33453/diff/4/?file=950516#file950516line110> > > > > personal opinion: should this logic be another method, such as > > readOffsetFromFile, not "cleanBaseDir"? A little misleading. I can modularize it further if it helps. Since we need to load offsets (if available) before cleaning directories, it became a part of it. Let me fix this. - Navina ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33453/#review82610 ----------------------------------------------------------- On May 5, 2015, 6:47 p.m., Navina Ramesh wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/33453/ > ----------------------------------------------------------- > > (Updated May 5, 2015, 6:47 p.m.) > > > Review request for samza, Yan Fang, Chris Riccomini, Naveen Somasundaram, and > Yi Pan (Data Infrastructure). > > > Repository: samza > > > Description > ------- > > Added checksum to the Offset file and some unit tests > > Added Unit Tests for TaskStorageManager and refactored some code > > Changed default to yarn cwd instead of io.tmpDir and refactored code > > > Diffs > ----- > > samza-core/src/main/scala/org/apache/samza/config/ShellCommandConfig.scala > e94a4735217f59d074510ce1556c8c439e6a72f0 > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala > ac4793afe1e6868933e750181bee1e27c157b5e6 > samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala > f68a7fee24614fce101e91c4f933d9b4e65dda0a > samza-core/src/main/scala/org/apache/samza/util/Util.scala > 8a83566ae6139127d7fe04ab42231151227dc479 > > samza-core/src/test/scala/org/apache/samza/storage/TestTaskStorageManager.scala > PRE-CREATION > samza-core/src/test/scala/org/apache/samza/util/TestUtil.scala > b75f44060fb8e660e824eaeb9cfdcc9d6fa902e8 > > samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala > 1b44a517129b35affac802929087eaa0061e6b5d > > Diff: https://reviews.apache.org/r/33453/diff/ > > > Testing > ------- > > Tested locally using hello-samza. > Note: you have to set an environment variable LOGGED_STORE_BASE_DIR pointing > to the new location to persist the changelog attached stores. Otherwise, it > will default to YARN's cwd and will not re-use local state. > > > Thanks, > > Navina Ramesh > >