Hi Wido,
Thanks for your answer and your kind help.
I tried to give you all useful information but maybe something is missing.
Let me know if you want me to do more tests.
Please find the output of ceph -s below:
[root@node91 ~]# ceph -s
2011-09-26 22:48:08.048659 pg v297: 792 pgs: 792 active+clean; 24 KB data,
80512 KB used, 339 GB / 340 GB avail
2011-09-26 22:48:08.049742 mds e5: 1/1/1 up {0=alpha=up:active}, 1 up:standby
2011-09-26 22:48:08.049764 osd e5: 4 osds: 4 up, 4 in
2011-09-26 22:48:08.049800 log 2011-09-26 19:38:14.372125 osd3
138.96.126.95:6800/2973 242 : [INF] 2.1p3 scrub ok
2011-09-26 22:48:08.049847 mon e1: 3 mons at
{alpha=138.96.126.91:6789/0,beta=138.96.126.92:6789/0,gamma=138.96.126.93:6789/0}
The same command ten minutes after the cfuse hangs on the client node :
[root@node91 ~]# ceph -s
2011-09-26 23:07:49.403774 pg v335: 792 pgs: 101 active, 276 active+clean,
415 active+clean+degraded; 4806 KB data, 114 MB used, 339 GB / 340 GB avail;
24/56 degraded (42.857%)
2011-09-26 23:07:49.404847 mds e5: 1/1/1 up {0=alpha=up:active}, 1 up:standby
2011-09-26 23:07:49.404867 osd e13: 4 osds: 2 up, 4 in
2011-09-26 23:07:49.404929 log 2011-09-26 23:07:46.093670 mds0
138.96.126.91:6800/4682 2 : [INF] closing stale session client4124
138.96.126.91:0/5563 after 455.778957
2011-09-26 23:07:49.404966 mon e1: 3 mons at
{alpha=138.96.126.91:6789/0,beta=138.96.126.92:6789/0,gamma=138.96.126.93:6789/0}
[root@node91 ~]# /etc/init.d/ceph -a status
=== mon.alpha ===
running...
=== mon.beta ===
running...
=== mon.gamma ===
running...
=== mds.alpha ===
running...
=== mds.beta ===
running...
=== osd.0 ===
dead.
=== osd.1 ===
running...
=== osd.2 ===
running...
=== osd.3 ===
dead.
I finally paste the last lines of osd.0 :
2011-09-26 22:57:06.822182 7faf6a6f8700 -- 138.96.126.92:6802/3157 >>
138.96.126.93:6801/3162 pipe(0x7faf50001320 sd=20 pgs=0 cs=0 l=0).accept
connect_seq 2 vs existing 1 state 3
2011-09-26 23:07:09.084901 7faf8e1b5700 FileStore: sync_entry timed out after
600 seconds.
ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6)
2011-09-26 23:07:09.084934 1: (SafeTimer::timer_thread()+0x323) [0x5c95a3]
2011-09-26 23:07:09.084943 2: (SafeTimerThread::entry()+0xd) [0x5cbc7d]
2011-09-26 23:07:09.084950 3: /lib64/libpthread.so.0() [0x31fec077e1]
2011-09-26 23:07:09.084957 4: (clone()+0x6d) [0x31fe4e18ed]
2011-09-26 23:07:09.084963 *** Caught signal (Aborted) **
in thread 0x7faf8e1b5700
ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6)
1: /usr/bin/cosd() [0x649ca9]
2: /lib64/libpthread.so.0() [0x31fec0f4c0]
3: (gsignal()+0x35) [0x31fe4329a5]
4: (abort()+0x175) [0x31fe434185]
5: (__assert_fail()+0xf5) [0x31fe42b935]
6: (SyncEntryTimeout::finish(int)+0x130) [0x683400]
7: (SafeTimer::timer_thread()+0x323) [0x5c95a3]
8: (SafeTimerThread::entry()+0xd) [0x5cbc7d]
9: /lib64/libpthread.so.0() [0x31fec077e1]
10: (clone()+0x6d) [0x31fe4e18ed]
ceph.conf:
[global]
max open files = 131072
log file = /var/log/ceph/$name.log
pid file = /var/run/ceph/$name.pid
[mon]
mon data = /data/$name
mon clock drift allowed = 1
[mon.alpha]
host = node91
mon addr = 138.96.126.91:6789
[mon.beta]
host = node92
mon addr = 138.96.126.92:6789
[mon.gamma]
host = node93
mon addr = 138.96.126.93:6789
[mds]
keyring = /data/keyring.$name
[mds.alpha]
host = node91
[mds.beta]
host = node92
[osd]
osd data = /data/$name
osd journal = /data/$name/journal
osd journal size = 1000
[osd.0]
host = node92
[osd.1]
host = node93
[osd.2]
host = node94
[osd.3]
host = node95
----
Thank you one more time for your help.
Regards
Cédric
Le 23 sept. 2011 à 19:20, Wido den Hollander a écrit :
> Hi.
>
> Could you sent us your ceph.conf and the output of "ceph -s" ?
>
> Wido
>
> On Fri, 2011-09-23 at 17:58 +0200, Cedric Morandin wrote:
>> Hi everybody,
>>
>> I didn't find any ceph-users list so I post here. If this is not the right
>> place to do it please let me know.
>> I'm currently trying to test ceph but I'm probably doing something wrong
>> because I have a really strange behavior.
>>
>> Context:
>> Ceph compiled and installed on five Centos6 machines.
>> A BTRFS partition is available on each machine.
>> This partition is mounted under /data/osd.[0-3]
>> Clients are using cfuse compiled for FC11 ( 2.6.29.4-167.fc11.x86_64 )
>>
>> What happen:
>> I configured everything in ceph.conf, started ceph daemons on all nodes.
>> When I issue ceph health, I have a HEALTH_OK answer.
>> I can access the filesystem through cfuse and create some files on it, but
>> when I try to create files bigger than 2 or 3 Mo, the filesystem hangs.
>> When I try to copy an entire directory ( ceph sources for instance) I have
>> the same problem.
>> When the system is in this state, the cosd daemon die on OSD machines: [INF]
>> osd0 out (down for 304.836218)
>> Even killing it doesn't release the mountpoint :
>> cosd 9170 root 10uW REG 8,6 8
>> 2506754 /data/osd.0/fsid
>> cosd 9170 root 11r DIR 8,6 4096
>> 2506753 /data/osd.0
>> cosd 9170 root 12r DIR 8,6 24576
>> 2506755 /data/osd.0/current
>> cosd 9170 root 13u REG 8,6 4
>> 2506757 /data/osd.0/current/commit_op_seq
>>
>>
>> I tried to change some parameters but it results in the same problem:
>> Tried both with the 0.34 and 0.35 releases and using both BTRFS or EXTR3
>> with user_attr attribute.
>> I tried the cfuse client on one of the Centos 6 machine.
>>
>> I read everything on http://ceph.newdream.net/wiki but I can't figure out
>> the problem.
>> Does somebody have any clue of the problem's origin ?
>>
>> Regards,
>>
>> Cedric Morandin
>>
>>
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html