On Fri, Nov 9, 2012 at 8:44 PM, Edison Su <[email protected]> wrote: > > >> -----Original Message----- >> From: Wido den Hollander [mailto:[email protected]] >> Sent: Friday, November 09, 2012 6:33 AM >> To: [email protected] >> Cc: Edison Su; Marcus Sorensen ([email protected]); 'Umasankar >> Mukkara'; 'John Burwell' >> Subject: Re: Update on the storage refactor >> >> Hi Edison, >> >> Thank you for the update! >> >> On 11/08/2012 10:05 PM, Edison Su wrote: >> > Hi All, >> > I send out storage refactor rfc few months >> ago(http://markmail.org/message/6sdig3hwt5puyvsc), but just starting code >> in the last few weeks. >> >> I just checked out the Javelin branch. Could you try to make the commits a >> bit >> more descriptive or maybe rebase locally to squash multiple commits in one >> before pushing? It's kind of hard to follow. >> >> Never the less, great work! >> >> > Here is my latest >> proposal(https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storag >> e+subsystem+2.0), and sample code is at >> engine/storage/src/org/apache/cloudstack/storage/volume/, >> engine/storage/src/org/apache/cloudstack/storage/datastore/ and >> engine/storage/src/org/apache/cloudstack/storage/image/ on javelin >> branch. >> > The ideal is very simple, delegate the logic into different components, >> into different identities. I'll try my best to get rid of the monolithic >> managers, >> which are hard to extend. >> > Feedback and comments are welcome. >> >> Is the API on the Wiki definitive? Since it seems the SnapshotService is >> missing: >> - List >> - Destroy > > Thanks, I'll add them into the code. > >> >> Apart from that: >> >> "BackupService will manage the snapshot on backup storage(s3/nfs etc) >> >> BackupMigrationService will handle how to move snapshot from primary >> storage to secondary storage, or vice versa. " >> >> Does that mean we will have "Backup Storage" AND "Secondary Storage" > > To me, the term "secondary storage" is too generalized. It has too many > responsibilities, make the code become more and more complicated. > In general, there are three main services: > Image service: where to get and create template and iso > Volume service: where to get and create volume > Backup service: where to get and put backup > The three "where" can operate on different physical storages, while they > also can share the same physical storage. These services themselves should > not know each other, even they are using the same physical storage.
This is a really smart way of looking at it. IMO, it actually opens up more options for advanced storage features. Take backup as an example. One of the main issues with non-app and non-guest OS aware snapshots is the inability to work with the impacted system to quiesce critical applications. I'm not suggesting a change in the scope of your work, just mentioning a potential avenue for the future. App consistency is usually better than OS crash level consistency for end users. > Take Ceph as an example, I think it can be both used as backup storage and > volume storage, while swift/s3/nfs can be used as both backup storage and > image storage. > The point here is that, one storage can have different roles(backup/image or > volume), how cloudstack use the storage, is depended on how admin configures > the storage. > The model I am trying to build is that: datastore provider, datastore, and > driver. > Datastore provider is responsible for the life cycle a particular storage > type, or a storage vendor. > Datastore represents a physical storage, it has three subclasses: > primarydatastore, imagedatastore and backupdatastore. Each subclass has > different APIs. > Driver represents the code dealing with actual storage device, it may > directly talk to storage device, or just send a command to resource > In a system, there are many data store providers. One physical storage can be > added into the system under one data store provider, the storage can be > configured with multiple roles, and can be attached to one scope(either per > zone, per pod, per cluster or per host). > When the services want to use the data store, first, service will ask data > store provider to give it an object of datastore(either primarydatastore, > imagedatastore, or backupdatastore), then call the object's API to do the > actual work. Then sample code is at *.datastore* package. > > To backward compatibility, "add secondary storage" can be implemented as "add > image storage" and "add backup storage" altogether, which is the case that > one storage has two roles. > To separate the roles from underlining storage, we can give admin more > flexibilities to build their system. > How do you think? > >> >> Imho it should be enough to have just Secondary Storage, but we should >> support both NFS, CIFS and object storage (S3, Swift, Ceph, etc, etc) as >> secondary storage. >> >> Since we are actually only storing some objects with metadata that shouldn't >> be a problem. >> >> On the KVM platform we are now using Qemu to convert images and that >> requires the destination to be a file, but we can always work around that I >> think. >> >> Imho the requirement to have a actual filesystem as Secondary Storage >> should be gone. >> >> NFS has it's issues, think about a NFS server dying while a qemu-img process >> is running. That will go into status D and will be blocking until the NFS >> comes >> back. >> >> Wido >> > >
