> On April 28, 2015, 5:43 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala, 
> > line 457
> > <https://reviews.apache.org/r/33453/diff/2/?file=939981#file939981line457>
> >
> >     Question: shouldn't the base dir to be determined by container ID, not 
> > jobId?

I think containerId is not really relevant because the store partition 
directories are uniquely named across the containers. The ownership of 
containers and partition directory is persisted in the config stream (going 
forward). I will verify this and confirm here.


> On April 28, 2015, 5:43 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala, 
> > line 483
> > <https://reviews.apache.org/r/33453/diff/2/?file=939981#file939981line483>
> >
> >     nit: I think that the readablity is a little better if:
> >     
> >     val storeBaseDir = if (changeLogSystemStreamPartition != null) {
> >       loggedStorageBaseDir
> >     } else {
> >       defaultStoreBaseDir
> >     }
> >     ...
> >     val storageEngine = ... (
> >       storeName,
> >       TaskStorageManager.getStorePartitionDir(storeBaseDir, storeName, 
> > taskName),
> >       ...

Ok. Agreed.


> On April 28, 2015, 5:43 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala,
> >  line 76
> > <https://reviews.apache.org/r/33453/diff/2/?file=939982#file939982line76>
> >
> >     How do we know that this store is using the default storeBaseDir not 
> > loggedStoreBaseDir?

At this point, we don't check if the store is using default storeBaseDir or 
loggedStoreBaseDir. Since the application could have used default storeBaseDir 
in the past, we are being a bit pessimistic and cleaning up any residual 
storage directories. 
I am not very happy with the clean up code. It is generic enough to be agnostic 
of the physical deployment. But this means it is doing additional work of 
deleting the directory which won't be needed otherwise. For example, if you are 
deploying using YARN, each application run is contained in its own app 
directory. Here, the "cleanBaseDir" step effectively only creates the storage 
directory (Line 72 to 77).


> On April 28, 2015, 5:43 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala,
> >  line 88
> > <https://reviews.apache.org/r/33453/diff/2/?file=939982#file939982line88>
> >
> >     qq: Is there any checksum in the offset file to make sure that we don't 
> > read in a corrupted value?

No. As of now, we don't have any checksum. Do you think it is better to add it 
now than later ?


> On April 28, 2015, 5:43 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala,
> >  line 99
> > <https://reviews.apache.org/r/33453/diff/2/?file=939982#file939982line99>
> >
> >     From the store initiation code, it seems that storeBaseDir won't have 
> > any logged store paths if logged store base dir is configured. Hence, here 
> > we may be deleting empty/non-existing storagePartitionDirs.

True. That is exactly what I meant in comment above. We can probably modify the 
Util.rm method to be a no-op when the file path doesn't exist.


- Navina


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33453/#review81838
-----------------------------------------------------------


On April 22, 2015, 9:54 p.m., Navina Ramesh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33453/
> -----------------------------------------------------------
> 
> (Updated April 22, 2015, 9:54 p.m.)
> 
> 
> Review request for samza, Yan Fang, Chris Riccomini, Naveen Somasundaram, and 
> Yi Pan (Data Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Changed default to yarn cwd instead of io.tmpDir and refactored code
> 
> 
> Diffs
> -----
> 
>   samza-core/src/main/scala/org/apache/samza/config/ShellCommandConfig.scala 
> 1a2dd4413f56e53dbeeb47b5637d7b0c50522f02 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> 720fbdceafc4fe69b048d81a677e874d13e6d22f 
>   samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
> f68a7fee24614fce101e91c4f933d9b4e65dda0a 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
>  66c2a0dc2e38e21f951727a30f0987776ac52fe2 
> 
> Diff: https://reviews.apache.org/r/33453/diff/
> 
> 
> Testing
> -------
> 
> Tested locally using hello-samza.
> Note: you have to set an environment variable LOGGED_STORE_BASE_DIR pointing 
> to the new location to persist the changelog attached stores. Otherwise, it 
> will default to YARN's cwd and will not re-use local state.
> 
> 
> Thanks,
> 
> Navina Ramesh
> 
>

Reply via email to