Re: [MERGE]object_store branch into master

Chip Childers Wed, 29 May 2013 13:42:53 -0700

On Tue, May 28, 2013 at 11:42:49PM +0000, Edison Su wrote:
> 
> 
> > -----Original Message-----
> > From: John Burwell [mailto:jburw...@basho.com]
> > Sent: Tuesday, May 28, 2013 8:43 AM
> > To: dev@cloudstack.apache.org
> > Subject: Re: [MERGE]object_store branch into master
> > 
> > All,
> > 
> > I have gone a through a large chunk of this patch, and published my review
> > thus far (https://reviews.apache.org/r/11277/).   TL;DR is that this patch 
> > has a
> > number of significant issues which can be summarized as follows:
> 
> 
> Thanks for your time to review this patch.
> 
> > 
> > 1. While it appeas that the requirement of NFS for secondary storage has
> > largely been removed, it has basically been if blocked out instead of pushed
> > down as an detail choice in the physical layer.  Rather than exploiting
> > polymorpish to vary behavior through a set of higher level abstracttions, 
> > the
> > orchestration performs instanceof NFSTO checks. The concern is that future
> > evolution of secondary storage layer will still have dependencies on NFS.
> 
> As long as mgt server doesn't explicitly depend on nfs secondary storage 
> during the orchestration, and the nfs storage can be used based on different 
> configurations, then the resource side "NFSTO checks" is not much concern to 
> me at the current stage. We can add a " polymorpish high level abstraction"  
> at the resource side, as you suggested before, in the future. The current 
> code change is already so huge now, add a new level a abstraction will make 
> the change even much bigger. But adding this kind of "high level abstraction" 
> is not hard any more, as all the storage related commands, are only dealing 
> with DataTO/DataStoreTO. Right now, there is no read/write data operations on 
> this *TO, one possible way to add a "high level abstraction" would be add one 
> read/write data operations at each hypervisor resource based on the DataTO 
> and DataStoreTO.
> My point is that refactor mgt server code is much much harder than the 
> resource code. If we can get the mgt server right, then we can move on to 
> refactor resource code, in the future.
> 
> > 
> > 2. In some sceanrios, NFS is still a requirement for secondary storage.  In
> > particular, Xen users will still have to maintain an "NFS cache".  Given the
> > degradation of capability, I think it would be helpful to put a few more
> > eyeballs on the Xen implementation to determine if we could avoid using an
> > intermediate file system.
> 
> Nfs will not disappear soon, for some hypervisors, for some type of storages. 
> But as long as the mgt server doesn't have this requirement, then people can 
> add new type of hypervisors(e.g. hyperv), new type of storages(ceph, 
> solidfire etc) which don't need NFS. The optimization for a particular 
> hypervisor can be done by other people, who may be familiar with some type of 
> storage, or a certain hypervisor. 
> 
> > 3. I have the following concerns regarding potential race conditions and
> > resource exhaustion in the NFS cache implementation.
> > 
> >     - The current random allocator algorithm does not account for the amount
> > space that will be required for the operation (i.e. checking to ensure that 
> > the
> > cache it picks has enough space to transfer to hold the object being
> > downloaded) nor does it reserve the space.Given the long (in compute time)
> > running nature of these of processes, the available space in a cache could 
> > be
> > exhausted by a number of concurrently transfering templates and/or
> > snapshots.  By reserving space before the transfer, the allocator would be
> > able to account for both pending operations and the current contents of the
> > cache.
> 
> 1. It's the current behavior of 4.1 and master secondary storage code. We 
> didn't introduce new issue here.
> 2. Add a new cache storage allocator is much easier than adding it in master 
> branch. 
> 3. I treat it as a bug, and plan to fix it after the merge.


We should not be merging in major changes with *known* bugs like this.

> 
> >     - There appears no mechanism to age out contents of the cache.
> > Therefore, as implemented, it will grow unbounded.  The only workaround
> > for this problem would be to have an NFS cache whose size equals that of
> > the object store.
> 
> 
> Admin can multiple nfs cache storages into one zone, and it's the current 
> behavior on master branch. Again, we can add aging policy later, as cache 
> storage has its own pom project, and it's own implementation.

That doesn't make much sense to me.  A cache is expected to be, by it's
nature, bounded by time / size / something...  We need to implement some
sort of bounds here.

It also shouldn't be a failure point if you have a cache miss or cache
failure. What behaviour occurs if there is only 1 and it's full?

> 
> 
> >     - The mechanism lacks robust error handling or retry logic.  In 
> > particular, the
> > behavior if/when a cache exhausts available space appears non-deterministic.
> 
> Again, master branch will have the same issue, fix the issue will be easier 
> than fixing it on master branch.

Why not fix it now, since the cache mechanism is being touched?

> 
> >     - Generally, I see little consideration for alternative/exception 
> > flows.  For
> > example, what happens if I attempt to use a template/iso/snapshot in transit
> > to the object store to/from the cache?  Since these files can be very large
> > (multiple GBs), we have assume some transfer latency in all interactions.
> 
> Management server mainly maintains the state of each 
> object(volume/snapshot/template), we add state machine for all the objects, 
> and for all the objects on each storages, which is unavailable on the master 
> branch. Take copy template from object store to cache as an example, the 
> state of template entry on object storage and cache storage will be changed 
> to "Copying", if copying failed or time out, then the state of template entry 
> on object storage will be changed to "Ready", while the state of template 
> entry on cache storage will be changed to "Failed". 
> As long as mgt server maintains the state correctly, user/admin can issue the 
> same operation through UI/API in case one operation is failed. That' s one of 
> reason, there is no retry logic in the storage subsystem code. Another reason 
> is that adding retry logic on top of async api will a little bit difficult.
> 
> > 
> > 4. The Image Cache abstraction is too tightly coupled to the core
> > orchestration mechanism.  I would recommend implementing it as a
> 
> The only place is coupled with image cache abstraction is at 
> datamotionstrategy, in particular, 
> AncientDataMotionStrategy ->needCacheStorage and AncientDataMotionStrategy 
> will call StorageCacheManager-> createCacheObject if cache storage is needed.
> 
> > DataStore proxy that is applied as necessary.  For example, the current Xen
> > hypervisor implementation must interact with a file system.  It seems most
> > appropriate that the Xen hypervisor implementation would apply the proxy
> > to its secondary storage DataStore instances.  Therefore, the storage engine
> 
> The issue that struggles me for a while is that, how and where to configure 
> cache storage. Take S3 as an example, the implementation of S3 provider 
> should not need to care about cache storage at all, S3 implementation should 
> only care about how to deal with data stored in S3. While S3 is a general 
> purpose storage, the same implementation can be used by different 
> hypervisors, and in different situations. For example,  If both Ceph and S3 
> used by KVM, then definitely, nfs cache is not needed, while for other 
> cluster wide primary storages supported by KVM, nfs cache is needed. 
> How and where to present this kind of configuration, and also both primary 
> storage provider and object storage provider implementation don't need to 
> aware this kind of configuration? My current solution is that, write the 
> decision in the code, DataMotionStrategy is the place to decide to use cache 
> storage or not. In the default implementation, all the copy operations from 
> s3 to primary storage, cache storage is needed, while if people adding a new 
> storage, and want to remove cache storage for certain reason, then add a new 
> DataMotionStrategy.
> 
> I am not sure how the datastore proxy will solve the issue I mentioned above, 
> I am glad to know your solution.

This looks like a good opportunity for some collaboration on the design!

> 
> > should only provide the facility not attempt to determine when it is needed.
> > Additionally, a proxy model would greatly reduce the size and complexity of
> > the various classes (e.g. AncientDataMotionStrategy).
> 
> The current AncientDataMotionStrategy is only about 400 LOC now, I removed 
> some of the duplicated code. But still want to know the detail of your 
> solution.
> 
> > 
> > 5. While I have only reviewed roughly 50-60% of the patch thus far, I see 
> > little
> > to no additional unit tests for the new code added.  Integration tests have
> 
> If the code is changing rapidly, then writing/maintaining unit test cases 
> will take huge effort, we focused on the integration test and marvin test in 
> this stage.

We shouldn't be merging code that is "still changing rapidly".  What do
you mean by this comment?

> 
> > been expanded, but many of the test simply execute the service methods
> > and do not verify the form of the operation results. There are also a
> > tremendous number of ignored exceptions that should likely fail these tests.
> > Therefore, we could have tests passing due to an ignored exception that
> > should be failing.
> 
> 1. Thanks, you take a look at the integration test, which is very convenient 
> for us during the implementation. It saves us a lot time during the test.
> 2. Yes, the integration test needed to be improved as your suggested. 

Let's do that please!

> 
> > 
> > While I recognize the tremendous effort that has been expended to
> > implement this capability and value of the feature, I see significant design
> > and operational issues that must be addressed before it can be accepted into
> 
> I would say the object _store branch is just better than the master branch, 
> almost all the issues(related to cache storages) you pointed out, are 
> existing on the master branch. While fixing these issues on master branch is 
> harder than on object_store branch. Frankly speaking, I don't want to 
> read/understand/guess how it works on the master branch, it's a lot of hack 
> all around the place.
> Of course, these issues are still issues, we need to fix them, but when to 
> fix them is arguable. I prefer fixing them after the merge. Here is the 
> reason:
> If we agreed on the issues you pointed out are existing on the master also, 
> then do we want to fix them on the 4.2 or not? If the answer is yes, then who 
> will fix them, and how long it will take to fix them on master? If I propose 
> an offer, that I'll fix these issues after merging object_store into master, 
> will it be acceptable? If the answer is no,  then there is no technical 
> reason to not merge object_store into master, as these issues are not 
> introduced by object_store branch.
> 
> 
> > master.  In my opinion, the object_store represents a good first step, but 
> > it
> > needs a few more review/refinement iterations before it will be ready for a
> > master merge.
> > 
> > Thanks,
> > -John

Re: [MERGE]object_store branch into master

Reply via email to