-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 I know writing to min_size as sync and size-min_size as async has been discussed before and would help here. From what I understand required a lot of code changes and goes against the strong consistency model of Ceph. I'm not sure if it will be implemented although I do love this idea to help against tail latency. - ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Thu, Aug 27, 2015 at 12:48 PM, Jan Schermer wrote: > Don't kick out the node, just deal with it gracefully and without > interruption... if the IO reached the quorum number of OSDs then there's no > need to block anymore, just queue it. Reads can be mirrored or retried (much > quicker, because making writes idempotent, ordered and async is pretty hard > and expensive). > If there's an easy way to detect unreliable OSD that flaps - great, let's > have a warning in ceph health. > > Jan > >> On 27 Aug 2015, at 20:43, Robert LeBlanc wrote: >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA256 >> >> This has been discussed a few times. The consensus seems to be to make >> sure error rates of NICs or other such metrics are included in your >> monitoring solution. It would also be good to preform periodic network >> tests like a full size ping with nofrag set between all nodes and have >> your monitoring solution report that as well. >> >> Although I would like to see such a feature in Ceph, the concern is >> that such a feature can quickly get out of hand and that something >> else that is really designed for it should do it. I can understand >> where they are coming from in that regard, but having Ceph kick out a >> misbehaving node quickly is appealing as well (there would have to be >> a way to specify that only so many nodes could be kicked out). >> - ---------------- >> Robert LeBlanc >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >> >> >> On Thu, Aug 27, 2015 at 9:37 AM, Christoph Adomeit wrote: >>> Hello Ceph Users, >>> >>> yesterday I had a defective Gbic in 1 node of my 10 node ceph cluster. >>> >>> The Gbic was working somehow but had 50% packet-loss. Some packets went >>> through, some did not. >>> >>> What happend that the whole cluster did not service requests in time, there >>> were lots of timeouts and so on >>> until the problem was isolated. Monitors and osds where asked for data but >>> did dot answer or answer late. >>> >>> I am wondering, here we have a highly redundant network setup and a highly >>> redundant piece of software, but a small >>> network fault brings down the whole cluster. >>> >>> Is there anything that can be configured or changed in ceph so that >>> availability will become better in case of flapping networks ? >>> >>> I understand, it is not a ceph problem but a network problem but maybe >>> something can be learned from such incidents ? >>> >>> Thanks >>> Christoph >>> -- >>> Christoph Adomeit >>> GATWORKS GmbH >>> Reststrauch 191 >>> 41199 Moenchengladbach >>> Sitz: Moenchengladbach >>> Amtsgericht Moenchengladbach, HRB 6303 >>> Geschaeftsfuehrer: >>> Christoph Adomeit, Hans Wilhelm Terstappen >>> >>> [email protected] Internetloesungen vom Feinsten >>> Fon. +49 2166 9149-32 Fax. +49 2166 9149-10 >>> _______________________________________________ >>> ceph-users mailing list >>> [email protected] >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> -----BEGIN PGP SIGNATURE----- >> Version: Mailvelope v1.0.2 >> Comment: https://www.mailvelope.com >> >> wsFcBAEBCAAQBQJV31pFCRDmVDuy+mK58QAA7qwQAL0EvbHneC00qhCX/jjT >> Xl8whWvQgm/UUDEPAWe2wGkgVZtP3cSAx/p+IkusZuD6NClIiWvazdz5n+vf >> cj4Y+S8Zj4Lw7gypHjy5GSCDSbQnEni32QNKp74GM/EZ1331gXuDvP0bS2Sz >> 7g5MXu8Vpf0Kdrj8JrOPnHY1PtljxkQXdrEmijDkmnjruO+XGFQrl8l9GFbN >> enFZI+PpEAoSEJPZosCnX+ZLM3/ZiwAfAPtvcARyDwdmjV7CjyRjVviloR3K >> DV/b+VuWX+NVzTZMKCnILVubt1Khexzk6reU3m7Yjy713dmEehDmKQsESFci >> pMi61iEuxje0O+iqOp+mhhYWtv+Iv7bbpHcGv04vfMsl6+ms6v/EHo/Cccoi >> ZiOa+xD6l7ZkO+A+2bvunBvC3cjBFXn8yrNpHDj6G+jUWMDuJcs7wAhExhPv >> Qicjhzk9AoTFXPIkfkGnuHJ/ngFnswdHeVa1DU7GV+Evh/2BCtoHH7Ur+XQY >> u7gL6LXt+2UAB3+ZIEvr2NOAFiIVsPqnGqQqNiNz5XQDFh5bD3e1iScucZbm >> VNStBkWDoDwrBYVe74cN55ZXA5auTSDYuYlen+BPbYhAKmpkBp+Suv1H4CFy >> 01cnANvJfbaxoBIPLzvhdx4c73Qd+J6ttxi2g8u8EedXDbPIYGFPy2madvtW >> JNPc >> =3sV8 >> -----END PGP SIGNATURE----- >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -----BEGIN PGP SIGNATURE----- Version: Mailvelope v1.0.2 Comment: https://www.mailvelope.com wsFcBAEBCAAQBQJV316iCRDmVDuy+mK58QAAxogP+QGDVhfGxa4OeIslEsoj aW3LY4nzzFP1iDJNjlvPDDTj5AcC56c2QhvshLRy3pYUmwoWqO0gnTOGh/YX ma6+hGVJCaBZU5L6rZ0SQfcfo3CNglIzQ2ts07Xb5XPQRrS6/yLsMki+kDf0 qjCqpZpPTL/d80sBrbCNDoZcnMKBYKwBZbay8RsSBZ0pHmdylfnhvGSxZBEk U8ZTrUdsZ9ejzdfh29byR3V/Mz6EkGVnnFPlkIAdkuZJPns+i6NGoe16z3kL 3u967qFfFcNGUWCGO0MC/iYT4fcRoramqMWhY5hBUD8DmWXgbKQmutal8vaO sDBfKLgmQkBpDgOTng6/uE/BpkROmjsjXuCar/xf+QwXMQJhIWFmBqcaKdQl TUjd4QovBJVFPWq9qpyh9ia95cfoFm942LaunaA4chTQnxjTbS/0aajSTM7/ OxUuMcPCnuAsbHXsj/wkPE1ZTmNU9KPQgo10h8UlhMoQU2fOZB8h/p/0fuqk bfBo9k07EkdakFFc/ASLpFqIeV49ZTYjUg/0MPdXW5KnJPb+4OBojIZZF9An /20UgUXqqBd2LKFY/bwKqoOw5bKCMxQCJptJXmY6zx7vJ76ahyOuiP/OaQWh 8uikTxCUeNdaNzez3s3TUgQSR4y94zT1It5rF16VrFPMik66Yq8x11t6z8eR Po34 =tqy/ -----END PGP SIGNATURE----- _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
