Hi, One way you can see exactly what is happening when you write an object is with --debug_ms=1.
For example, I write a 100MB object to a test pool: rados --debug_ms=1 -p test put 100M.dat 100M.dat I pasted the output of this here: https://pastebin.com/Zg8rjaTV In this case, it first gets the cluster maps from a mon, then writes the object to osd.58, which is the primary osd for PG 119.77: # ceph pg 119.77 query | jq .up [ 58, 49, 31 ] Otherwise I answered your questions below... On Sun, Jun 17, 2018 at 8:29 PM Jialin Liu <jaln...@lbl.gov> wrote: > > Hello, > > I have a couple questions regarding the IO on OSD via librados. > > > 1. How to check which osd is receiving data? > See `ceph osd map`. For my example above: # ceph osd map test 100M.dat osdmap e236396 pool 'test' (119) object '100M.dat' -> pg 119.864b0b77 (119.77) -> up ([58,49,31], p58) acting ([58,49,31], p58) > 2. Can the write operation return immediately to the application once the > write to the primary OSD is done? or does it return only when the data is > replicated twice? (size=3) Write returns once it is safe on *all* replicas or EC chunks. > 3. What is the I/O size in the lower level in librados, e.g., if I send a > 100MB request with 1 thread, does librados send the data by a fixed > transaction size? This depends on the client. The `rados` CLI example I showed you broke the 100MB object into 4MB parts. Most use-cases keep the objects around 4MB or 8MB. > 4. I have 4 OSS, 48 OSDs, will the 4 OSS become the bottleneck? from the ceph > documentation, once the cluster map is received by the client, the client can > talk to OSD directly, so the assumption is the max parallelism depends on the > number of OSDs, is this correct? > That's more or less correct -- the IOPS and BW capacity of the cluster generally scales linearly with number of OSDs. Cheers, Dan CERN _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com