Comments inline.

On Tue, Oct 7, 2014 at 5:51 PM, Marcus White <[email protected]>
wrote:

> Hello,
> Some basic Ceph questions, would appreciate your help:) Sorry about
> the number and detail in advance!
>
> a. Ceph RADOS is strongly consistent and different from usual object,
> does that mean all metadata also, container and account etc is all
> consistent and everything is updated in the path of the client
> operation itself, for a single site?
>

Yes.  In a single site, it's CP out of CAP.


> b. If it is strongly consistent, is that the case across sites also?
> How can it be performant across geo sites if that is the case? If its
> choosing consistency over partitioning and availability...For object,
> I read somewhere that it is now eventually consistent(local CP,
> remotely AP) via DR. Gets a bit confusing with all the literature out
> there. If it is DR, isnt that slightly different from the Swift case?
>

If you're referring to RadosGW Federation, no.  That replication is async.
The replication has several delays built in, so the fastest you could to
see your data show up in the secondary is about a minute.  Longer if the
file takes a while to transfer, or you have a lot of activity to replicate.

Each site is still CP.  There is just delay getting data from the primary
to the secondary.


If you want CP in multiple locations, that's doable by creating one cluster
that spans both locations, and tuning the CRUSH rules to make sure the
object is written to both locations. You really want a low latency
connection between the two sites.

I tested one cluster in two colos with 20ms of latency between them.  It
worked, but it was noticeably slow.  I went with two clusters and async
replication.



>
> c. For block, is it CP on a single site and then usual DR to another
> site using snapshotting?
>

Yes.



>
> d. For block, is it just a linux block device or is it SCSI? Is it a
> custom device driver running within Linux which hooks into the block
> layer? Trying to understand the layering diagram.
>

I'm a bit out of my element here, but there is a kernel module and a FUSE
module.  The kernel module connects RDB images to a /dev/rbd/... block
device.  It can then be used however you would use a block device.  Most
people put a filesystem on it, but it's not required.  I'm really
unfamiliar with the FUSE module.

Several people are exporting RDB images via iSCSI and Fiber Channel.


> e. Do the snapshot, compression features come from the underlying file
> system?
>

It depends on the filesystem.  Ceph will emulate any required features that
the FS doesn't support.  For example, ext4 and XFS have no snapshots, so
Ceph has track them itself.  On BtrFS, Ceph uses the native snapshots, and
it much quicker because of it.


>
> f. What is the plan for deduplication? If that comes from the local
> file system, how would it deduplicate across nodes to achieve the best
> dedup ratio?
>
>
I don't believe Ceph does anything with de-dup.  If the FS underneath has
it turned on, it can de-dup the stuff it sees, but there's no cluster-wide
de-dup.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to