I should have probably condensed my finding over the course of the day into
one post but, I guess that just not how i'm built.....
Another data point. I ran the `ceph daemon mds.cephmds02 perf dump` in a
while loop w/ 1 second sleep and grepping out the stats John mentioned and
at times(~every 10-15 seconds), I have some large objector.op_active
values. After the high values hit, there are 5-10 seconds of zero values.
"handle_client_request": 5785438,
"op_active": 2375,
"handle_client_request": 5785438,
"op_active": 2444,
"handle_client_request": 5785438,
"op_active": 2239,
"handle_client_request": 5785438,
"op_active": 1648,
"handle_client_request": 5785438,
"op_active": 1121,
"handle_client_request": 5785438,
"op_active": 709,
"handle_client_request": 5785438,
"op_active": 235,
"handle_client_request": 5785572,
"op_active": 0,
...............
Should I be concerned about these "op_active" values? I see that in my
narrow slice of output, "handle_client_request" does not increment. What
is happening there?
thanks,
Bob
On Wed, Aug 5, 2015 at 11:43 PM, Bob Ababurko <[email protected]> wrote:
> I found a way to get the stats you mentioned: mds_server.handle_client_request
> & objecter.op_active. I can see these values when I run:
>
> ceph daemon mds.<id> perf dump
>
> I recently restarted the mds server so my stats reset but I still have
> something to share:
>
> "mds_server.handle_client_request": 4406055
> "objecter.op_active": 0
>
> Should I assume that op_active might be operations in writes or reads that
> are queued? I haven't been able to find anything describing what these
> stats actually mean so if anyone knows where to find them, please advise.
>
> On Wed, Aug 5, 2015 at 4:59 PM, Bob Ababurko <[email protected]> wrote:
>
>> I have installed diamond(built by ksingh found at
>> https://github.com/ksingh7/ceph-calamari-packages) on the MDS node and I
>> am not seeing the mds_server.handle_client_request OR objecter.op_active
>> metrics being sent to graphite. Mind you, this is not the graphite that is
>> part of the calamari install but our own internal graphite cluster.
>> Perhaps that is the reason? I could not get calamari working correctly on
>> hammerhead/centos7.1 so I put it on pause for now to concentrate on the
>> cluster itself.
>>
>> Ultimately, I need to find a way to get a hold of these metrics to
>> determine the health of my MDS so I can justify moving forward on a SSD
>> based cephfs metadata pool.
>>
>> On Wed, Aug 5, 2015 at 4:05 PM, Bob Ababurko <[email protected]> wrote:
>>
>>> Hi John,
>>>
>>> You are correct in that my expectations may be incongruent with what is
>>> possible with ceph(fs). I'm currently copying many small files(images)
>>> from a netapp to the cluster...~35k sized files to be exact and the number
>>> of objects/files copied thus far is fairly significant(below in bold):
>>>
>>> [bababurko@cephmon01 ceph]$ sudo rados df
>>> pool name KB objects clones degraded
>>> unfound rd rd KB wr wr KB
>>> cephfs_data 3289284749 *163993660* 0 0
>>> 0 0 0 328097038 3369847354
>>> cephfs_metadata 133364 524363 0 0
>>> 0 3600023 5264453980 95600004 1361554516
>>> rbd 0 0 0 0
>>> 0 0 0 0 0
>>> total used 9297615196 164518023
>>> total avail 19990923044
>>> total space 29288538240
>>>
>>> Yes, that looks like ~164 million objects copied to the cluster. I
>>> would assume this will potentially be a burden to the MDS but I have yet to
>>> confirm with the ceph daemontool mds.<id>. I cannot seem to run it on the
>>> mds host as it doesn't seem to know about that command:
>>>
>>> [bababurko@cephmds01]$ sudo ceph daemonperf mds.cephmds01
>>> no valid command found; 10 closest matches:
>>> osd lost <int[0-]> {--yes-i-really-mean-it}
>>> osd create {<uuid>}
>>> osd primary-temp <pgid> <id>
>>> osd primary-affinity <osdname (id|osd.id)> <float[0.0-1.0]>
>>> osd reweight <int[0-]> <float[0.0-1.0]>
>>> osd pg-temp <pgid> {<id> [<id>...]}
>>> osd in <ids> [<ids>...]
>>> osd rm <ids> [<ids>...]
>>> osd down <ids> [<ids>...]
>>> osd out <ids> [<ids>...]
>>> Error EINVAL: invalid command
>>>
>>> This fails in a similar manner on all the hosts in the cluster. I'm
>>> very green w/ ceph and i'm probably missing something obvious. Is there
>>> something I need to install to get access to the 'ceph daemonperf' command
>>> in hammerhead?
>>>
>>> thanks,
>>> Bob
>>>
>>> On Wed, Aug 5, 2015 at 2:43 AM, John Spray <[email protected]> wrote:
>>>
>>>> On Tue, Aug 4, 2015 at 10:36 PM, Bob Ababurko <[email protected]> wrote:
>>>> > My writes are not going as I would expect wrt to IOPS(50-1000 IOPs) &
>>>> write
>>>> > throughput( ~25MB/s max). I'm interested in understanding what it
>>>> takes to
>>>> > create a SSD pool that I can then migrate the current Cephfs_metadata
>>>> pool
>>>> > to. I suspect that the spinning disk metadata pool is a bottleneck
>>>> and I
>>>> > want to try to get the max performance out of this cluster to prove
>>>> that we
>>>> > would build out a larger version. One caveat is that I have copied
>>>> about 4
>>>> > TB of data to the cluster via cephfs and dont want to lose the data
>>>> so I
>>>> > obviously need to keep the metadata intact.
>>>>
>>>> I'm a bit suspicious of this: your IOPS expectations sort of imply
>>>> doing big files, but you're then suggesting that metadata is the
>>>> bottleneck (i.e. small file workload).
>>>>
>>>> There are lots of statistics that come out of the MDS, you may be
>>>> particular interested in mds_server.handle_client_request,
>>>> objecter.op_active, to work out if there really are lots of RADOS
>>>> operations getting backed up on the MDS (which would be the symptom of
>>>> a too-slow metadata pool). "ceph daemonperf mds.<id>" may be some
>>>> help if you don't already have graphite or similar set up.
>>>>
>>>> > If anyone has done this OR understands how this can be done, I would
>>>> > appreciate the advice.
>>>>
>>>> You could potentially do this in a two-phase process where you
>>>> initially set a crush rule that includes both SSDs and spinners, and
>>>> then finally set a crush rule that just points to SSDs. Obviously
>>>> that'll do lots of data movement, but your metadata is probably a fair
>>>> bit smaller than your data so that might be acceptable.
>>>>
>>>> John
>>>>
>>>
>>>
>>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com