Re: [ceph-users] Check networking first?

pixelfairy Sat, 01 Aug 2015 19:46:48 -0700

thanks to this im adding regular bandwidth tests. is there, or should there
be a best practices doc on ceph.com?


On Sat, Aug 1, 2015 at 2:16 PM Josef Johansson <[email protected]> wrote:

> Hi,
>
> I did a "big-ping" test to verify the network after last major network
> problem. If anyone wants to take a peek I could share.
>
> Cheers
>
>
> Josef
>
> lör 1 aug 2015 02:19 Ben Hines <[email protected]> skrev:
>
>> I encountered a similar problem. Incoming firewall ports were blocked
>> on one host. So the other OSDs kept marking that OSD as down. But, it
>> could talk out, so it kept saying 'hey, i'm up, mark me up' so then
>> the other OSDs started trying to send it data again, causing backed up
>> requests.. Which goes on, ad infinitum. I had to figure out the
>> connectivity problem myself by looking in the OSD logs.
>>
>> After a while, the cluster should just say 'no, you're not reachable,
>> stop putting yourself back into the cluster'.
>>
>> -Ben
>>
>> On Fri, Jul 31, 2015 at 11:21 AM, Jan Schermer <[email protected]> wrote:
>> > I remember reading that ScaleIO (I think?) does something like this by
>> regularly sending reports to a multicast group, thus any node with issues
>> (or just overload) is reweighted or avoided automatically on the client.
>> OSD map is the Ceph equivalent I guess. It makes sense to gather metrics
>> and prioritize better performing OSDs over those with e.g. worse latencies,
>> but it needs to update fast. But I believe that _network_ monitoring itself
>> ought to be part of… a network monitoring system you should already have
>> :-) and not a storage system that just happens to use network. I don’t
>> remember seeing anything but a simple ping/traceroute/dns test in any SAN
>> interface. If an OSD has issues it might be anything from a failing drive
>> to a swapping OS and a number like “commit latency” (= response time
>> average from the clients’ perspective) is maybe the ultimate metric of all
>> for this purpose, irrespective of the root cause.
>> >
>> > Nice option would be to read data from all replicas at once - this
>> would of course increase load and cause all sorts of issues if abused, but
>> if you have an app that absolutely-always-without-fail-must-get-data-ASAP
>> then you could enable this in the client (and I think that would be an easy
>> option to add). This is actually used in some systems. Harder part is to
>> fail nicely when writing (like waiting only for the remote network buffers
>> on 2 nodes to get the data instead of waiting for commit on all 3 replicas…)
>> >
>> > Jan
>> >
>> >> On 31 Jul 2015, at 19:45, Robert LeBlanc <[email protected]> wrote:
>> >>
>> >> -----BEGIN PGP SIGNED MESSAGE-----
>> >> Hash: SHA256
>> >>
>> >> Even just a ping at max MTU set with nodefrag could tell a lot about
>> >> connectivity issues and latency without a lot of traffic. Using Ceph
>> >> messenger would be even better to check firewall ports. I like the
>> >> idea of incorporating simple network checks into Ceph. The monitor can
>> >> correlate failures and help determine if the problem is related to one
>> >> host from the CRUSH map.
>> >> - ----------------
>> >> Robert LeBlanc
>> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>> >>
>> >>
>> >> On Thu, Jul 30, 2015 at 11:27 PM, Stijn De Weirdt  wrote:
>> >>> wouldn't it be nice that ceph does something like this in background
>> (some
>> >>> sort of network-scrub). debugging network like this is not that easy
>> (can't
>> >>> expect admins to install e.g. perfsonar on all nodes and/or clients)
>> >>>
>> >>> something like: every X min, each service X pick a service Y on
>> another host
>> >>> (assuming X and Y will exchange some communication at some point;
>> like osd
>> >>> with other osd), send 1MB of data, and make the timing data available
>> so we
>> >>> can monitor it and detect underperforming links over time.
>> >>>
>> >>> ideally clients also do this, but not sure where they should
>> report/store
>> >>> the data.
>> >>>
>> >>> interpreting the data can be a bit tricky, but extreme outliers will
>> be
>> >>> spotted easily, and the main issue with this sort of debugging is
>> collecting
>> >>> the data.
>> >>>
>> >>> simply reporting / keeping track of ongoing communications is already
>> a big
>> >>> step forward, but then we need to have the size of the exchanged data
>> to
>> >>> allow interpretation (and the timing should be about the network
>> part, not
>> >>> e.g. flush data to disk in case of an osd). (and obviously sampling is
>> >>> enough, no need to have details of every bit send).
>> >>>
>> >>>
>> >>>
>> >>> stijn
>> >>>
>> >>>
>> >>> On 07/30/2015 08:04 PM, Mark Nelson wrote:
>> >>>>
>> >>>> Thanks for posting this!  We see issues like this more often than
>> you'd
>> >>>> think.  It's really important too because if you don't figure it out
>> the
>> >>>> natural inclination is to blame Ceph! :)
>> >>>>
>> >>>> Mark
>> >>>>
>> >>>> On 07/30/2015 12:50 PM, Quentin Hartman wrote:
>> >>>>>
>> >>>>> Just wanted to drop a note to the group that I had my cluster go
>> >>>>> sideways yesterday, and the root of the problem was networking
>> again.
>> >>>>> Using iperf I discovered that one of my nodes was only moving data
>> at
>> >>>>> 1.7Mb / s. Moving that node to a different switch port with a
>> different
>> >>>>> cable has resolved the problem. It took awhile to track down because
>> >>>>> none of the server-side error metrics for disk or network showed
>> >>>>> anything was amiss, and I didn't think to test network performance
>> (as
>> >>>>> suggested in another thread) until well into the process.
>> >>>>>
>> >>>>> Check networking first!
>> >>>>>
>> >>>>> QH
>> >>>>>
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> ceph-users mailing list
>> >>>>> [email protected]
>> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>>
>> >>>> _______________________________________________
>> >>>> ceph-users mailing list
>> >>>> [email protected]
>> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> [email protected]
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >> -----BEGIN PGP SIGNATURE-----
>> >> Version: Mailvelope v0.13.1
>> >> Comment: https://www.mailvelope.com
>> >>
>> >> wsFcBAEBCAAQBQJVu7QoCRDmVDuy+mK58QAAcpAQAKbv6xPRxMMJ8NWrXym0
>> >> NAtZFIYywvStKfTG2pL1xjb2p/xDM+6Z5mnYJTBHb+0dkGIO6qe0jF9t4XEE
>> >> ppH+55eIpkCZrKMdfN1L0vUe9ldFnJS2jsAlGkvzyRLJale++q1evymIAaWb
>> >> JnEZgV3pGrPTCRaVKNrT3NaGZVDLm6ygnsT6PYJaiXM8Av3equ00Uls2/i6v
>> >> vZhlIBz5TbKsNag/W7cRJVvjj7YDsgU+dplDl62mmDJ6o+cWvILlf9WPINdV
>> >> MrmIeg+7fqUEp8nuEzTMm+BDHQ3c/5cxrYr8bksiVoBTXV7m9fO0Je9Exn6N
>> >> iWTa5eDUBtR6Ha8WaVUib/cvFj6j94QRNWYmXHl9lG50p+XZ0L5bZ1G8v9Nb
>> >> gGxRoYgAncp9M1J+7Pvm5z8wZgxXAs/veUtrf+6SkUbGyCRnUSn/VS7C8syJ
>> >> 4WW2aWP/A0nxSDe1u+TGpkkPmhk7UDrJEfMQaZrFwS9FkFLfgLH7PxMcAZjJ
>> >> hlN129vldPh3QxLviLidlJmzUTvKtb+XrSkA0MjhFMJS2M79DR16j+XWe7Ub
>> >> wPnKpZcZ8WsQzOlTHtDEHQvhE3ilcm+4oALSiuqEAZKNKk8lUTtvfzJ2BKyu
>> >> Tv46c+Wf3LbwrdMnkGiMHLuIlqhQT2FzauM2Pi+Pt7QJ7L9xXfWW4vzdemxj
>> >> bBQD
>> >> =rPC0
>> >> -----END PGP SIGNATURE-----
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> [email protected]
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > [email protected]
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Check networking first?

Reply via email to