Hi,

One way you can see exactly what is happening when you write an object
is with --debug_ms=1.

For example, I write a 100MB object to a test pool:  rados
--debug_ms=1 -p test put 100M.dat 100M.dat
I pasted the output of this here: https://pastebin.com/Zg8rjaTV
In this case, it first gets the cluster maps from a mon, then writes
the object to osd.58, which is the primary osd for PG 119.77:

# ceph pg 119.77 query | jq .up
[
  58,
  49,
  31
]

Otherwise I answered your questions below...

On Sun, Jun 17, 2018 at 8:29 PM Jialin Liu <jaln...@lbl.gov> wrote:
>
> Hello,
>
> I have a couple questions regarding the IO on OSD via librados.
>
>
> 1. How to check which osd is receiving data?
>

See `ceph osd map`.
For my example above:

# ceph osd map test 100M.dat
osdmap e236396 pool 'test' (119) object '100M.dat' -> pg 119.864b0b77
(119.77) -> up ([58,49,31], p58) acting ([58,49,31], p58)

> 2. Can the write operation return immediately to the application once the 
> write to the primary OSD is done? or does it return only when the data is 
> replicated twice? (size=3)

Write returns once it is safe on *all* replicas or EC chunks.

> 3. What is the I/O size in the lower level in librados, e.g., if I send a 
> 100MB request with 1 thread, does librados send the data by a fixed 
> transaction size?

This depends on the client. The `rados` CLI example I showed you broke
the 100MB object into 4MB parts.
Most use-cases keep the objects around 4MB or 8MB.

> 4. I have 4 OSS, 48 OSDs, will the 4 OSS become the bottleneck? from the ceph 
> documentation, once the cluster map is received by the client, the client can 
> talk to OSD directly, so the assumption is the max parallelism depends on the 
> number of OSDs, is this correct?
>

That's more or less correct -- the IOPS and BW capacity of the cluster
generally scales linearly with number of OSDs.

Cheers,
Dan
CERN
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to