On Tue, May 28, 2013 at 11:42:49PM +0000, Edison Su wrote: > > > > -----Original Message----- > > From: John Burwell [mailto:jburw...@basho.com] > > Sent: Tuesday, May 28, 2013 8:43 AM > > To: dev@cloudstack.apache.org > > Subject: Re: [MERGE]object_store branch into master > > > > All, > > > > I have gone a through a large chunk of this patch, and published my review > > thus far (https://reviews.apache.org/r/11277/). TL;DR is that this patch > > has a > > number of significant issues which can be summarized as follows: > > > Thanks for your time to review this patch. > > > > > 1. While it appeas that the requirement of NFS for secondary storage has > > largely been removed, it has basically been if blocked out instead of pushed > > down as an detail choice in the physical layer. Rather than exploiting > > polymorpish to vary behavior through a set of higher level abstracttions, > > the > > orchestration performs instanceof NFSTO checks. The concern is that future > > evolution of secondary storage layer will still have dependencies on NFS. > > As long as mgt server doesn't explicitly depend on nfs secondary storage > during the orchestration, and the nfs storage can be used based on different > configurations, then the resource side "NFSTO checks" is not much concern to > me at the current stage. We can add a " polymorpish high level abstraction" > at the resource side, as you suggested before, in the future. The current > code change is already so huge now, add a new level a abstraction will make > the change even much bigger. But adding this kind of "high level abstraction" > is not hard any more, as all the storage related commands, are only dealing > with DataTO/DataStoreTO. Right now, there is no read/write data operations on > this *TO, one possible way to add a "high level abstraction" would be add one > read/write data operations at each hypervisor resource based on the DataTO > and DataStoreTO. > My point is that refactor mgt server code is much much harder than the > resource code. If we can get the mgt server right, then we can move on to > refactor resource code, in the future. > > > > > 2. In some sceanrios, NFS is still a requirement for secondary storage. In > > particular, Xen users will still have to maintain an "NFS cache". Given the > > degradation of capability, I think it would be helpful to put a few more > > eyeballs on the Xen implementation to determine if we could avoid using an > > intermediate file system. > > Nfs will not disappear soon, for some hypervisors, for some type of storages. > But as long as the mgt server doesn't have this requirement, then people can > add new type of hypervisors(e.g. hyperv), new type of storages(ceph, > solidfire etc) which don't need NFS. The optimization for a particular > hypervisor can be done by other people, who may be familiar with some type of > storage, or a certain hypervisor. > > > 3. I have the following concerns regarding potential race conditions and > > resource exhaustion in the NFS cache implementation. > > > > - The current random allocator algorithm does not account for the amount > > space that will be required for the operation (i.e. checking to ensure that > > the > > cache it picks has enough space to transfer to hold the object being > > downloaded) nor does it reserve the space.Given the long (in compute time) > > running nature of these of processes, the available space in a cache could > > be > > exhausted by a number of concurrently transfering templates and/or > > snapshots. By reserving space before the transfer, the allocator would be > > able to account for both pending operations and the current contents of the > > cache. > > 1. It's the current behavior of 4.1 and master secondary storage code. We > didn't introduce new issue here. > 2. Add a new cache storage allocator is much easier than adding it in master > branch. > 3. I treat it as a bug, and plan to fix it after the merge.
We should not be merging in major changes with *known* bugs like this. > > > - There appears no mechanism to age out contents of the cache. > > Therefore, as implemented, it will grow unbounded. The only workaround > > for this problem would be to have an NFS cache whose size equals that of > > the object store. > > > Admin can multiple nfs cache storages into one zone, and it's the current > behavior on master branch. Again, we can add aging policy later, as cache > storage has its own pom project, and it's own implementation. That doesn't make much sense to me. A cache is expected to be, by it's nature, bounded by time / size / something... We need to implement some sort of bounds here. It also shouldn't be a failure point if you have a cache miss or cache failure. What behaviour occurs if there is only 1 and it's full? > > > > - The mechanism lacks robust error handling or retry logic. In > > particular, the > > behavior if/when a cache exhausts available space appears non-deterministic. > > Again, master branch will have the same issue, fix the issue will be easier > than fixing it on master branch. Why not fix it now, since the cache mechanism is being touched? > > > - Generally, I see little consideration for alternative/exception > > flows. For > > example, what happens if I attempt to use a template/iso/snapshot in transit > > to the object store to/from the cache? Since these files can be very large > > (multiple GBs), we have assume some transfer latency in all interactions. > > Management server mainly maintains the state of each > object(volume/snapshot/template), we add state machine for all the objects, > and for all the objects on each storages, which is unavailable on the master > branch. Take copy template from object store to cache as an example, the > state of template entry on object storage and cache storage will be changed > to "Copying", if copying failed or time out, then the state of template entry > on object storage will be changed to "Ready", while the state of template > entry on cache storage will be changed to "Failed". > As long as mgt server maintains the state correctly, user/admin can issue the > same operation through UI/API in case one operation is failed. That' s one of > reason, there is no retry logic in the storage subsystem code. Another reason > is that adding retry logic on top of async api will a little bit difficult. > > > > > 4. The Image Cache abstraction is too tightly coupled to the core > > orchestration mechanism. I would recommend implementing it as a > > The only place is coupled with image cache abstraction is at > datamotionstrategy, in particular, > AncientDataMotionStrategy ->needCacheStorage and AncientDataMotionStrategy > will call StorageCacheManager-> createCacheObject if cache storage is needed. > > > DataStore proxy that is applied as necessary. For example, the current Xen > > hypervisor implementation must interact with a file system. It seems most > > appropriate that the Xen hypervisor implementation would apply the proxy > > to its secondary storage DataStore instances. Therefore, the storage engine > > The issue that struggles me for a while is that, how and where to configure > cache storage. Take S3 as an example, the implementation of S3 provider > should not need to care about cache storage at all, S3 implementation should > only care about how to deal with data stored in S3. While S3 is a general > purpose storage, the same implementation can be used by different > hypervisors, and in different situations. For example, If both Ceph and S3 > used by KVM, then definitely, nfs cache is not needed, while for other > cluster wide primary storages supported by KVM, nfs cache is needed. > How and where to present this kind of configuration, and also both primary > storage provider and object storage provider implementation don't need to > aware this kind of configuration? My current solution is that, write the > decision in the code, DataMotionStrategy is the place to decide to use cache > storage or not. In the default implementation, all the copy operations from > s3 to primary storage, cache storage is needed, while if people adding a new > storage, and want to remove cache storage for certain reason, then add a new > DataMotionStrategy. > > I am not sure how the datastore proxy will solve the issue I mentioned above, > I am glad to know your solution. This looks like a good opportunity for some collaboration on the design! > > > should only provide the facility not attempt to determine when it is needed. > > Additionally, a proxy model would greatly reduce the size and complexity of > > the various classes (e.g. AncientDataMotionStrategy). > > The current AncientDataMotionStrategy is only about 400 LOC now, I removed > some of the duplicated code. But still want to know the detail of your > solution. > > > > > 5. While I have only reviewed roughly 50-60% of the patch thus far, I see > > little > > to no additional unit tests for the new code added. Integration tests have > > If the code is changing rapidly, then writing/maintaining unit test cases > will take huge effort, we focused on the integration test and marvin test in > this stage. We shouldn't be merging code that is "still changing rapidly". What do you mean by this comment? > > > been expanded, but many of the test simply execute the service methods > > and do not verify the form of the operation results. There are also a > > tremendous number of ignored exceptions that should likely fail these tests. > > Therefore, we could have tests passing due to an ignored exception that > > should be failing. > > 1. Thanks, you take a look at the integration test, which is very convenient > for us during the implementation. It saves us a lot time during the test. > 2. Yes, the integration test needed to be improved as your suggested. Let's do that please! > > > > > While I recognize the tremendous effort that has been expended to > > implement this capability and value of the feature, I see significant design > > and operational issues that must be addressed before it can be accepted into > > I would say the object _store branch is just better than the master branch, > almost all the issues(related to cache storages) you pointed out, are > existing on the master branch. While fixing these issues on master branch is > harder than on object_store branch. Frankly speaking, I don't want to > read/understand/guess how it works on the master branch, it's a lot of hack > all around the place. > Of course, these issues are still issues, we need to fix them, but when to > fix them is arguable. I prefer fixing them after the merge. Here is the > reason: > If we agreed on the issues you pointed out are existing on the master also, > then do we want to fix them on the 4.2 or not? If the answer is yes, then who > will fix them, and how long it will take to fix them on master? If I propose > an offer, that I'll fix these issues after merging object_store into master, > will it be acceptable? If the answer is no, then there is no technical > reason to not merge object_store into master, as these issues are not > introduced by object_store branch. > > > > master. In my opinion, the object_store represents a good first step, but > > it > > needs a few more review/refinement iterations before it will be ready for a > > master merge. > > > > Thanks, > > -John