Re: Update on the storage refactor

Chip Childers Fri, 09 Nov 2012 20:41:54 -0800

On Fri, Nov 9, 2012 at 8:44 PM, Edison Su <[email protected]> wrote:
>
>
>> -----Original Message-----
>> From: Wido den Hollander [mailto:[email protected]]
>> Sent: Friday, November 09, 2012 6:33 AM
>> To: [email protected]
>> Cc: Edison Su; Marcus Sorensen ([email protected]); 'Umasankar
>> Mukkara'; 'John Burwell'
>> Subject: Re: Update on the storage refactor
>>
>> Hi Edison,
>>
>> Thank you for the update!
>>
>> On 11/08/2012 10:05 PM, Edison Su wrote:
>> > Hi All,
>> >      I send out storage refactor rfc few months
>> ago(http://markmail.org/message/6sdig3hwt5puyvsc), but just starting code
>> in the last few weeks.
>>
>> I just checked out the Javelin branch. Could you try to make the commits a 
>> bit
>> more descriptive or maybe rebase locally to squash multiple commits in one
>> before pushing? It's kind of hard to follow.
>>
>> Never the less, great work!
>>
>> >      Here is my latest
>> proposal(https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storag
>> e+subsystem+2.0), and sample code is at
>> engine/storage/src/org/apache/cloudstack/storage/volume/,
>> engine/storage/src/org/apache/cloudstack/storage/datastore/ and
>> engine/storage/src/org/apache/cloudstack/storage/image/ on javelin
>> branch.
>> >     The ideal is very simple, delegate the logic into different components,
>> into different identities. I'll try my best to get rid of the monolithic 
>> managers,
>> which are hard to extend.
>> >     Feedback and comments are welcome.
>>
>> Is the API on the Wiki definitive? Since it seems the SnapshotService is
>> missing:
>> - List
>> - Destroy
>
> Thanks, I'll add them into the code.
>
>>
>> Apart from that:
>>
>> "BackupService will manage the snapshot on backup storage(s3/nfs etc)
>>
>> BackupMigrationService will handle how to move snapshot from primary
>> storage to secondary storage, or vice versa. "
>>
>> Does that mean we will have "Backup Storage" AND "Secondary Storage"
>
> To me, the term "secondary storage" is too generalized. It has too many 
> responsibilities, make the code become more and more complicated.
> In general, there are three main services:
> Image service: where to get and create template and iso
> Volume service: where to get and create volume
> Backup service: where to get and put backup
> The three "where" can operate on different physical storages,  while they 
> also can share the same physical storage. These services themselves should 
> not know each other, even they are using the same physical storage.


This is a really smart way of looking at it.  IMO, it actually opens
up more options for advanced storage features.  Take backup as an
example.  One of the main issues with non-app and non-guest OS aware
snapshots is the inability to work with the impacted system to quiesce
critical applications.  I'm not suggesting a change in the scope of
your work, just mentioning a potential avenue for the future.  App
consistency is usually better than OS crash level consistency for end
users.

> Take Ceph as an example, I think it can be both used as backup storage and 
> volume storage, while swift/s3/nfs can be used as both backup storage and 
> image storage.
> The point here is that, one storage can have different roles(backup/image or 
> volume), how cloudstack use the storage, is depended on how admin configures 
> the storage.
> The model I am trying to build is that: datastore provider, datastore, and 
> driver.
> Datastore provider is responsible for the life cycle a particular storage 
> type, or a storage vendor.
> Datastore represents a physical storage, it has three subclasses: 
> primarydatastore, imagedatastore and backupdatastore. Each subclass has 
> different APIs.
> Driver represents the code dealing with actual storage device, it may 
> directly talk to storage device, or just send a command to resource
> In a system, there are many data store providers. One physical storage can be 
> added into the system under one data store provider, the storage can be 
> configured with multiple roles, and can be attached to one scope(either per 
> zone, per pod, per cluster or per host).
> When the services want to use the data store, first, service will ask data 
> store provider to give it an object of datastore(either primarydatastore, 
> imagedatastore, or backupdatastore), then call the object's API to do the 
> actual work. Then sample code is at *.datastore* package.
>
> To backward compatibility, "add secondary storage" can be implemented as "add 
> image storage" and "add backup storage" altogether, which is the case that 
> one storage has two roles.
> To separate the roles from underlining storage, we can give admin more 
> flexibilities to build their system.
>  How do you think?
>
>>
>> Imho it should be enough to have just Secondary Storage, but we should
>> support both NFS, CIFS and object storage (S3, Swift, Ceph, etc, etc) as
>> secondary storage.
>>
>> Since we are actually only storing some objects with metadata that shouldn't
>> be a problem.
>>
>> On the KVM platform we are now using Qemu to convert images and that
>> requires the destination to be a file, but we can always work around that I
>> think.
>>
>> Imho the requirement to have a actual filesystem as Secondary Storage
>> should be gone.
>>
>> NFS has it's issues, think about a NFS server dying while a qemu-img process
>> is running. That will go into status D and will be blocking until the NFS 
>> comes
>> back.
>>
>> Wido
>>
>
>

Re: Update on the storage refactor

Reply via email to