Re: [ceph-devel] btrfs Object Store

Padraig O'Sullivan Mon, 21 Jul 2008 20:57:11 -0700

On Mon, Jul 21, 2008 at 8:05 PM, Sage Weil <[EMAIL PROTECTED]> wrote:
> On Mon, 21 Jul 2008, Padraig O'Sullivan wrote:
>> Everything goes fine until I attempt to create a few files and
>> directories using csyn as suggested at the end of the document on
>> creating a small test cluster. The csyn command just hangs and never
>> returns. The output I get is:
>>
>> # ./csyn --syn makedirs 1 1 1 --syn walk
>> starting csyn at 0.0.0.0:36175/7874/0
>> mounting and starting 1 syn client(s)
>> waiting for client(s) to finish
>>
>> When I do an strace of the csyn process, I can see that it is waiting
>> on the futex system call. Does anyone have ideas as to why this is
>> happening? I'd like to be able to debug this further and understand
>> exactly what's happening. Any suggestions on how to debug this or what
>> log files (I have debugging enabled for all of the Ceph daemons) etc.
>> to look at would be great.
>
> All of csyn's interactions with the system are over the network, so
> stracing it won't tell you too much.  Try passing --debug_client 10 and
> --debug_ms 1 to see what messages it is passing and whether it
> successfully "mounts" the fs.
>
> My guess is the osds are having trouble getting initialized.  You can do
>
> ./cmonctl mds stat
> ./cmonctl pg stat
>
> to tell you whether the MDS successfully started, and whether the
> placements groups came online (they should all be 'active+clean', but if
> the osds are having trouble initializing they'll be stuck at 'creating').
>
> Let me know how it goes, and feel free to ping me in #ceph on oftc.net.
>
> sage
>


You were right about the osds having trouble initializing. The output
I'm getting from the cmonctl commands looks like what you expected:

# ./cmonctl mds stat
mon0 <- 'mds stat'
mon0 -> 'e2: 1 nodes: 1 up:creating' (0)
# ./cmonctl pg stat
mon0 <- 'pg stat'
mon0 -> 'v5: 1152 pgs: 1152 creating; 0 KB used, 0 KB / 0 KB free' (0)
#

So there is definitely something up with btrfs here (I guess btrfs is
far from stable so I can't complain) since I have had no issues when
setting up a cluster with block devices. I have debugging enabled on
the osd daemons and have been having a look through their log files. I
have 2 storage nodes and in the log file for the first one (after some
initialization messages), I keep seeing the following lines repeated
over and over:

080721.054648.450368 osd0 3 calc_priors_during 2x0.0 [2,3) =
080721.054648.450396 osd0 3 mkpg 2x0.0 e2 h  ec=0 les=0 3/3/3 : querying priors
080721.054648.450412 osd0 3 try_create_pg 2x0.0 - creating now
080721.054648.450420 osd0 3 _create_lock_pg 2x0.0
080721.054648.450976 osd0 3 pg[2x0.0( empty  ec=3 les=3 3/3/3) r=0 pct
0'0] write_log
080721.054648.451080 osd0 3 pg[2x0.0( empty  ec=3 les=3 3/3/3) r=0 pct
0'0] write_log to [0,0)
080721.054648.451090 osd0 3 _create_lock_new_pg pg[2x0.0( empty  ec=3
les=3 3/3/3) r=0 pct 0'0]
080721.054648.451113 osd0 3 pg[2x0.0( empty  ec=3 les=3 3/3/3) r=0 pct
0'0] peer.  acting is [0], prior_set is
080721.054648.451129 osd0 3 pg[2x0.0( empty  ec=3 les=3 3/3/3) r=0 pct
0'0]  have prior_set info.  peers_complete_thru 0'0
080721.054648.451146 osd0 3 pg[2x0.0( empty  ec=3 les=3 3/3/3) r=0 pct
0'0]  newest_update 0'0 (me)
080721.054648.451153 osd0 3 pg[2x0.0( empty  ec=3 les=3 3/3/3) r=0 pct
0'0]  oldest_update 0'0
080721.054648.451734 osd0 3 pg[2x0.0( empty  ec=3 les=3 3/3/3) r=0 pct
0'0]  peers_complete_thru 0'0
080721.054648.451782 osd0 3 pg[2x0.0( empty  ec=3 les=3 3/3/3) r=0 pct
0'0 active] write_log
080721.054648.451846 osd0 3 pg[2x0.0( empty  ec=3 les=3 3/3/3) r=0 pct
0'0 active] write_log to [0,0)
080721.054648.451871 osd0 3 pg[2x0.0( empty  ec=3 les=3 3/3/3) r=0 pct
0'0 active] clean_up_local
080721.054648.451888 osd0 3 pg[2x0.0( empty  ec=3 les=3 3/3/3) r=0 pct
0'0 active] activate - complete
080721.054648.451927 osd0 3 pg[2x0.0( empty  ec=3 les=3 3/3/3) r=0 pct
0'0 active] finish_recovery
080721.054648.451943 osd0 3 pg[2x0.0( empty  ec=3 les=3 3/3/3) r=0 pct
0'0 active clean] purge_strays
080721.054648.452038 osd0 3 calc_priors_during 2x0.1 [2,3) =
080721.054648.452046 osd0 3 mkpg 2x0.1 e2 h  ec=0 les=0 3/3/3 : querying priors
080721.054648.452053 osd0 3 try_create_pg 2x0.1 - creating now
080721.054648.452058 osd0 3 _create_lock_pg 2x0.1

while in the log file for the second storage node, I see the following
lines repeated for some time:

080721.053755.682731 osd1 4 calc_priors_during 2x0.0 [2,4) = 0
080721.053755.683295 osd1 4 mkpg 2x0.0 e2 h  ec=0 les=0 4/4/4 :
querying priors 0
080721.053755.683321 osd1 4 try_create_pg 2x0.0 - waiting for priors 0
080721.053755.683427 osd1 4 calc_priors_during 2x0.1 [2,4) = 0
080721.053755.683435 osd1 4 mkpg 2x0.1 e2 h  ec=0 les=0 4/4/4 :
querying priors 0
080721.053755.683442 osd1 4 try_create_pg 2x0.1 - waiting for priors 0

Then those messages stop and I begin seeing this repeated:

080721.053755.754045 osd1 4 try_create_pg 4x2.7f - waiting for priors 0
080721.053755.754275 osd1 4 do_queries querying osd0 on 585 PGs
080721.053755.754438 osd1 4 kick_pg_split_queue
080721.053755.754517 osd1 4 wait_for_no_ops - none
080721.053755.754525 osd1 4 handle_osd_map epochs [1,4], i have 4
080721.053755.754539 osd1 4 handle_osd_map already had incremental map epoch 1
080721.053755.754565 osd1 4  .. it is 196 bytes
080721.053755.754583 osd1 4 handle_osd_map already had incremental map epoch 2
080721.053755.754613 osd1 4  .. it is 587 bytes
080721.053755.754625 osd1 4 handle_osd_map already had incremental map epoch 3
080721.053755.754646 osd1 4  .. it is 120 bytes
080721.053755.754658 osd1 4 handle_osd_map already had incremental map epoch 4
080721.053755.754678 osd1 4  .. it is 120 bytes
080721.053755.754685 osd1 4 write_superblock sb(fsid
5657c7053faf5f22.5caaf2297d8eaf77 osd1 e4 [1,4])
080721.053756.601799 osd1 4 heartbeat: stat(080721.053756.601331
rdlat=0 / 0 fshedin=0)
080721.053757.702019 osd1 4 heartbeat: stat(080721.053757.701902
rdlat=0 / 0 fshedin=0)
080721.053758.402729 osd1 4 heartbeat: stat(080721.053758.402609
rdlat=0 / 0 fshedin=0)
080721.053759.598931 osd1 4 do_mon_report
080721.053759.598953 osd1 4 send_alive up_thru currently 0 want 0
080721.053759.598977 osd1 4 send_pg_stats
080721.053759.803331 osd1 4 heartbeat: stat(080721.053759.803197
rdlat=0 / 0 fshedin=0)
080721.053800.404611 osd1 4 heartbeat: stat(080721.053800.404157
rdlat=0 / 0 fshedin=0)
080721.053801.105346 osd1 4 heartbeat: stat(080721.053801.105230
rdlat=0 / 0 fshedin=0)
080721.053802.306048 osd1 4 heartbeat: stat(080721.053802.305933
rdlat=0 / 0 fshedin=0)
080721.053802.806434 osd1 4 heartbeat: stat(080721.053802.806326
rdlat=0 / 0 fshedin=0)
080721.053804.206704 osd1 4 heartbeat: stat(080721.053804.206541
rdlat=0 / 0 fshedin=0)
080721.053804.599602 osd1 4 do_mon_report

I'm not really sure what these messages mean in the log files as I'm
not that familiar with Ceph internally yet. Do you have any pointers
or ideas (based on this limited information) on what's going on here?
Or if you have any suggestions on other log files to look at?

I've read through the USENIX paper describing the architecture and I'm
starting to go through the source code to gain a better understanding
of how everything works which I hope will also help me in
understanding what the messages in the log files mean (or at least
where they are coming from). I'm taking a grad course in HPC in the
Fall which involves a significant project and I was hoping to do a
project with Ceph as I find it pretty interesting and I'd like to be
able to contribute to an open source project. I know this is quite
off-topic now but are the projects suggested for Google's summer of
code still available for someone to work on?

-Padraig

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Ceph-devel mailing list
Ceph-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ceph-devel

Re: [ceph-devel] btrfs Object Store

Reply via email to