Hello Oliver, sorry if i wasn't clear at my first post.
I agree with you that a network issue isn't desirable but should it crash
mount clients? I mean, doesn't the client be smart enough to retry
connection or so?
My point is cloud environments (public) doesn't have the same availability
as a local setup, so shouldn't we at least don't freeze the clients?


---
Diego Castro / The CloudFather
GetupCloud.com - Eliminamos a Gravidade

2016-04-01 12:27 GMT-03:00 Oliver Dzombic <[email protected]>:

> Hi Diego,
>
> ok so this is a new case scenario.
>
> Before you said its "until i put some load on it".
>
> Now you say, you can't reproduce it and mention that it happends during
> a (known) network maintenance.
>
> So i agree with you, we can assume that your problems were based on
> network issues.
>
> Thats also was your logs implies:
>
> "failed lossy con, dropping message"
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:[email protected]
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 01.04.2016 um 14:07 schrieb Diego Castro:
> > Hello Oliver, this issue showed very hard to reproduce, i couldn't make
> > it again.
> > My best guess is something with the Azure's network since last week
> > (when happened a lot) there were a ongoing maintenance.
> >
> > Here's  the outputs:
> >
> > $ ceph -s
> >     cluster 25736883-dbf1-4d7a-8796-50e36f9de7a6
> >      health HEALTH_OK
> >      monmap e1: 4 mons at
> > {osmbr0=
> 10.0.3.4:6789/0,osmbr1=10.0.3.6:6789/0,osmbr2=10.0.3.14:6789/0,osmbr3=10.0.3.7:6789/0
> > <
> http://10.0.3.4:6789/0,osmbr1=10.0.3.6:6789/0,osmbr2=10.0.3.14:6789/0,osmbr3=10.0.3.7:6789/0
> >}
> >             election epoch 602, quorum 0,1,2,3
> osmbr0,osmbr1,osmbr3,osmbr2
> >      osdmap e1816: 10 osds: 10 up, 10 in
> >       pgmap v3158931: 128 pgs, 1 pools, 11512 MB data, 3522 objects
> >             34959 MB used, 10195 GB / 10229 GB avail
> >                  128 active+clean
> >   client io 87723 B/s wr, 8 op/s
> >
> > $ ceph osd df
> > ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE VAR
> >  6 1.00000  1.00000  1022G  3224M  1019G 0.31 0.92
> >  1 1.00000  1.00000  1022G  3489M  1019G 0.33 1.00
> >  2 1.00000  1.00000  1022G  3945M  1019G 0.38 1.13
> >  4 1.00000  1.00000  1022G  3304M  1019G 0.32 0.95
> >  7 1.00000  1.00000  1022G  3427M  1019G 0.33 0.98
> >  3 1.00000  1.00000  1022G  4361M  1018G 0.42 1.25
> >  9 1.00000  1.00000  1022G  3650M  1019G 0.35 1.04
> >  0 1.00000  1.00000  1022G  3210M  1019G 0.31 0.92
> >  5 1.00000  1.00000  1022G  3577M  1019G 0.34 1.02
> >  8 1.00000  1.00000  1022G  2765M  1020G 0.26 0.79
> >               TOTAL 10229G 34957M 10195G 0.33
> > MIN/MAX VAR: 0.79/1.25  STDDEV: 0.04
> >
> >
> >
> > $ ceph osd perf
> > osd fs_commit_latency(ms) fs_apply_latency(ms)
> >   0                     1                    2
> >   1                     1                    2
> >   2                     2                    3
> >   3                     2                    3
> >   4                     1                    2
> >   5                     2                    3
> >   6                     1                    2
> >   7                     2                    3
> >   8                     1                    2
> >   9                     1                    1
> >
> >
> >
> >
> >
> >
> >
> > ---
> > Diego Castro / The CloudFather
> > GetupCloud.com - Eliminamos a Gravidade
> >
> > 2016-03-31 18:00 GMT-03:00 Oliver Dzombic <[email protected]
> > <mailto:[email protected]>>:
> >
> >     Hi Diego,
> >
> >     lets start with the basics and please give us the output of
> >
> >     ceph -s
> >     ceph osd df
> >     ceph osd perf
> >
> >     at best before and after you provike the iowait.
> >
> >     Thank you !
> >
> >     --
> >     Mit freundlichen Gruessen / Best regards
> >
> >     Oliver Dzombic
> >     IP-Interactive
> >
> >     mailto:[email protected] <mailto:[email protected]>
> >
> >     Anschrift:
> >
> >     IP Interactive UG ( haftungsbeschraenkt )
> >     Zum Sonnenberg 1-3
> >     63571 Gelnhausen
> >
> >     HRB 93402 beim Amtsgericht Hanau
> >     Geschäftsführung: Oliver Dzombic
> >
> >     Steuer Nr.: 35 236 3622 1
> >     UST ID: DE274086107
> >
> >
> >     Am 31.03.2016 um 21:38 schrieb Diego Castro:
> >     > Hello, everyone.
> >     > I have a pretty basic ceph setup running on top of Azure Cloud, (4
> mons
> >     > and 10 osd's) for rbd images.
> >     > Everything seems to be working as expected until i put some load
> on it,
> >     > sometimes it doesn't complete the process (mysql restore for ex.)
> and
> >     > sometimes it does without any issues.
> >     >
> >     >
> >     > Client Kernel: 3.10.0-327.10.1.el7.x86_64
> >     > OSD Kernel: 3.10.0-229.7.2.el7.x86_64
> >     >
> >     > Ceph: ceph-0.94.5-0.el7.x86_64
> >     >
> >     > On the client side, i have 100%iowait, a lot of "INFO: task
> blocked for
> >     > more than 120 seconds"
> >     > On the osd side, i have no evidences of faulty disk or read/write
> >     > latency, but i found the following messages:
> >     >
> >     >
> >     > 2016-03-28 17:04:03.425249 7f7329fc5700  0 bad crc in data
> 641367213 !=
> >     > exp 3107019767
> >     > 2016-03-28 17:04:03.440599 7f7329fc5700  0 -- 10.0.3.9:6800/2272 <
> http://10.0.3.9:6800/2272>
> >     > <http://10.0.3.9:6800/2272> >> 10.0.2.5:0/1998047321
> >     <http://10.0.2.5:0/1998047321>
> >     > <http://10.0.2.5:0/1998047321> pipe(0x13cc4800 sd=54 :6800 s=0
> pgs=0
> >     > cs=0 l=0 c=0x13883f40).accept peer addr is really
> 10.0.2.5:0/1998047321 <http://10.0.2.5:0/1998047321>
> >     > <http://10.0.2.5:0/1998047321> (socket is 10.0.2.5:34702/0
> >     <http://10.0.2.5:34702/0>
> >     > <http://10.0.2.5:34702/0>)
> >     > 2016-03-28 17:04:03.487497 7f7333e6a700  0 -- 10.0.3.9:6800/2272 <
> http://10.0.3.9:6800/2272>
> >     > <http://10.0.3.9:6800/2272> submit_message osd_op_reply(20046
> >     > rb.0.6040.238e1f29.000000000074 [set-alloc-hint object_size 4194304
> >     > write_size 4194304,write 0~524288] v1753'32512 uv32512 ondisk = 0)
> v6
> >     > remote, 10.0.2.5:0/1998047321 <http://10.0.2.5:0/1998047321>
> >     <http://10.0.2.5:0/1998047321>, failed
> >     > lossy con, dropping message 0x12b539c0
> >     > 2016-03-28 17:04:03.532302 7f733666f700  0 -- 10.0.3.9:6800/2272 <
> http://10.0.3.9:6800/2272>
> >     > <http://10.0.3.9:6800/2272> submit_message osd_op_reply(20047
> >     > rb.0.6040.238e1f29.000000000074 [set-alloc-hint object_size 4194304
> >     > write_size 4194304,write 524288~524288] v1753'32513 uv32513 ondisk
> = 0)
> >     > v6 remote, 10.0.2.5:0/1998047321 <http://10.0.2.5:0/1998047321>
> >     <http://10.0.2.5:0/1998047321>, failed
> >     > lossy con, dropping message 0x1667bc80
> >     > 2016-03-28 17:04:03.535143 7f7333e6a700  0 -- 10.0.3.9:6800/2272 <
> http://10.0.3.9:6800/2272>
> >     > <http://10.0.3.9:6800/2272> submit_message osd_op_reply(20048
> >     > rb.0.6040.238e1f29.000000000074 [set-alloc-hint object_size 4194304
> >     > write_size 4194304,write 1048576~524288] v1753'32514 uv32514
> ondisk = 0)
> >     > v6 remote, 10.0.2.5:0/1998047321 <http://10.0.2.5:0/1998047321>
> >     <http://10.0.2.5:0/1998047321>, failed
> >     > lossy con, dropping message 0x12b56e00
> >     >
> >     > ---
> >     > Diego Castro / The CloudFather
> >     > GetupCloud.com - Eliminamos a Gravidade
> >     >
> >     >
> >     > _______________________________________________
> >     > ceph-users mailing list
> >     > [email protected] <mailto:[email protected]>
> >     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >     >
> >     _______________________________________________
> >     ceph-users mailing list
> >     [email protected] <mailto:[email protected]>
> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to