Re: [ceph-users] Multi-site Implementation

2014-04-02 Thread Craig Lewis


I assume you're talking about Option Two: MULTI-SITE OBJECT STORAGE 
WITH FEDERATED GATEWAYS, from Inktank's 
http://info.inktank.com/multisite_options_with_inktank_ceph_enterprise


There are still some options.  Each zone has a master and one (or more) 
replicas.  You can only write to the master zone, but you can read from 
the master or replicas.  Regions and Zones live inside a Ceph clusters.  
Each Ceph cluster can have multiple zones.  Each zone has it's own URL 
and web servers.


Just like any replication strategy, this can be as simple or complicated 
as you want to make it.  For example, you could set up a single master 
zone in site one that replicates to sites 2 and 3.  Or you could setup 3 
master zones that replicate in a ring, Site 1 master - Site 2 replica, 
Site 2 master - Site 3 replica, Site 3 master - Site 1 replica.  It's 
more complicated, but it lets everybody read/write to their local 
cluster, as long as you're prepared to deal with 6 different URLs.


Which setup you choose really depends on your requirements, and it 
changes the answers to the rest of your questions.


*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website http://www.centraldesktop.com/  | Twitter 
http://www.twitter.com/centraldesktop  | Facebook 
http://www.facebook.com/CentralDesktop  | LinkedIn 
http://www.linkedin.com/groups?gid=147417  | Blog 
http://cdblog.centraldesktop.com/


On 4/1/14 10:06 , Shang Wu wrote:

Hi all,

I have some questions about the Ceph multi-site implementation.

I am thinking to have Ceph as the storage solution for across three internal 
site. I think, with a good internet connection, using the Multi-site object 
storage with RADOS (or RGW) might be a good use here. Thus, each site will have 
a MON node and many OSDs and replicate data between each other. With this 
implementation, I hope it will allow user to READ/WRITE from/to the local 
office and Ceph will take care the replication.

So my question is:

1. How does Ceph know how to retrieve data from the nearest location? (As Ceph 
usually calculate where the data is through CRUSH rather than the nearest 
location for the user.) Will the data be distributed evenly throughout the 
three sites? If not, how can we let user to access the _local copy_ ?
2. Is  Multi-site object storage with RADOS a good fit for their
implementation? i.e. to READ/Write data To/From their local site? If not,
what is the best way to approach this?
3. Does Ceph use the same ID (object name?) for all its replica? Can we
access(read/write) these replica directly?
4. From this multi-site scenario,
when a user write data to Ceph, will it find the nearest OSD to put the data? 
When a user read data, does it always respond from the primary data
set (doesn't matter the location) or respond from the nearest replica copy?

Thanks,

Shang Wu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] write speed issue on RBD image

2014-04-02 Thread Russell E. Glaue
Can someone recommend some testing I can do to further investigate why this 
issue with slow-disk-write in the VM OS is occurring?
It seems the issue, details below, are perhaps related to the VM OS running on 
the RADOS images in Ceph.


Issue:
I have a handful (like 10) of VM's running that, when tested, report slow disk 
write speed of 8MB/s-30MB/s. All of the remaining VM's (like 40) are reporting 
fast disk write speed of average 800MB/s-1.0GB/s. There are no VMs reporting 
any disk write speeds in-between these numbers. Restarting the OS on any of the 
VMs does not resolve the issue.

After these tests, I took one of the VMs (image02host) with slow disk write 
speed and reinstalled the basic OS, including repartitioning the disk. I used 
the same RADOS image. After this, I retested this VM (image02host) and all the 
other VMs with slow disk write speed. This VM (image02host) I reinstalled the 
OS on no longer has the slow disk write speeds any longer. And, surprisingly, 
one of the other VMs (another-host) with slow disk write speed started having 
fast write speeds. All other VMs with slow disk write speed continued the same.

So, I do not necessarily believe the slow disk issue is directly related to any 
kind of bug or outstanding issue with Ceph/RADOS. I only have a couple guesses 
at this point:
1. Perhaps my OS install (or possibly configuration), somehow is having issue. 
I don't see how this is possible, however. For all the VMs I have tested, they 
have all been kick-started with the same disk and OS configuration. So they are 
virtually identical, but are having either fast or slow disk write speed among 
them.
2. Perhaps I have some bad sectors or hard drive error at the hardware level 
that is causing the issue. Perhaps the RADOS images of these handful (like 10) 
of VMs is being written across a bad part of a hard drive. This seems more 
likely to me. However, all drives across all Ceph hosts are reporting good 
health.

So, now, I have come to the ceph-user list to ask for help. What are some 
things I can do to test if there is some, possibly, bad sector or hardware 
error on one of the hard drives, or some issue with Ceph writing to part of one 
of the hard drives? Or are there any other tests I can run to help determine 
possible issues.

And, secondly, if I wanted to move a RADOS image to new OSD blocks, is there a 
way to do that without exporting and importing the image? Perhaps, by 
resplattering the image and testing again to see if the issue is resolved, this 
can help determine if the existing slow disk write speed issue is how the image 
is splattered across OSDs - indicating a bad OSD hard drive, or bad parts of an 
OSD hard drive.


Ceph Configuration:
* Ceph Version 0.72.2
* Three Ceph hosts, CentOS 6.5 OS, using Xfs
* All connected via 10GbE network
* KVM/QEMU Virtualization, with Ceph support
* Virtual Machines are all RHEL 5.9 32bit
* Our Ceph setup is very basic. One pool for all VM disks, all drives on all 
Ceph hosts are in that pool.
* Ceph Caching is on:
rbd cache = true
rbd cache size = 128
rbd cache max dirty = 64
rbd cache target dirty = 64
rbd cache max dirty age = 10.0


Test:
Here I provide the test results of two VMs that are running on the same Ceph 
host, using disk images from the same ceph pool, and were cloned from the same 
RADOS snapshot. They both have the same exact KVM configuration. However, they 
report dramaticly different write speeds. When I tested them both, they were 
running on the same Ceph host. In fact, for the VM reporting slow disk write 
speed, I even had it run on a different Ceph host to test, and it still gave 
the same disk write speed results.

[root@linux]# rbd -p images info osimage01
rbd image 'osimage01':
size 28672 MB in 7168 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.2bfb74b0dc51
format: 2
features: layering
[root@linux]# rbd -p images info osimage02
rbd image 'osimage02':
size 28672 MB in 7168 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.2c1a2ae8944a
format: 2
features: layering

None of the images used are cloned.

[root@linux]# ssh image01host
image01host [65]% dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 0.760446 seconds, 706 MB/s
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 0.214783 seconds, 2.5 GB/s
image01host [66]% dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 0.514886 seconds, 1.0 GB/s
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 0.198433 seconds, 2.7 GB/s
image01host [67]% dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test 

Re: [ceph-users] write speed issue on RBD image

2014-04-02 Thread Russell E. Glaue
Correction:
When I wrote Here I provide the test results of two VMs that are running on 
the same Ceph host, using disk images from the same ceph pool, and were cloned 
from the same RADOS snapshot.
I really meant: Here I provide the test results of two VMs that are running on 
the same Ceph host, using disk images from the same ceph pool, and were NOT 
cloned from ANY RADOS snapshot.

-RG


- Original Message -
From: Russell E. Glaue rgl...@cait.org
To: ceph-users@lists.ceph.com
Sent: Wednesday, April 2, 2014 1:12:46 PM
Subject: [ceph-users] write speed issue on RBD image

Can someone recommend some testing I can do to further investigate why this 
issue with slow-disk-write in the VM OS is occurring?
It seems the issue, details below, are perhaps related to the VM OS running on 
the RADOS images in Ceph.


Issue:
I have a handful (like 10) of VM's running that, when tested, report slow disk 
write speed of 8MB/s-30MB/s. All of the remaining VM's (like 40) are reporting 
fast disk write speed of average 800MB/s-1.0GB/s. There are no VMs reporting 
any disk write speeds in-between these numbers. Restarting the OS on any of the 
VMs does not resolve the issue.

After these tests, I took one of the VMs (image02host) with slow disk write 
speed and reinstalled the basic OS, including repartitioning the disk. I used 
the same RADOS image. After this, I retested this VM (image02host) and all the 
other VMs with slow disk write speed. This VM (image02host) I reinstalled the 
OS on no longer has the slow disk write speeds any longer. And, surprisingly, 
one of the other VMs (another-host) with slow disk write speed started having 
fast write speeds. All other VMs with slow disk write speed continued the same.

So, I do not necessarily believe the slow disk issue is directly related to any 
kind of bug or outstanding issue with Ceph/RADOS. I only have a couple guesses 
at this point:
1. Perhaps my OS install (or possibly configuration), somehow is having issue. 
I don't see how this is possible, however. For all the VMs I have tested, they 
have all been kick-started with the same disk and OS configuration. So they are 
virtually identical, but are having either fast or slow disk write speed among 
them.
2. Perhaps I have some bad sectors or hard drive error at the hardware level 
that is causing the issue. Perhaps the RADOS images of these handful (like 10) 
of VMs is being written across a bad part of a hard drive. This seems more 
likely to me. However, all drives across all Ceph hosts are reporting good 
health.

So, now, I have come to the ceph-user list to ask for help. What are some 
things I can do to test if there is some, possibly, bad sector or hardware 
error on one of the hard drives, or some issue with Ceph writing to part of one 
of the hard drives? Or are there any other tests I can run to help determine 
possible issues.

And, secondly, if I wanted to move a RADOS image to new OSD blocks, is there a 
way to do that without exporting and importing the image? Perhaps, by 
resplattering the image and testing again to see if the issue is resolved, this 
can help determine if the existing slow disk write speed issue is how the image 
is splattered across OSDs - indicating a bad OSD hard drive, or bad parts of an 
OSD hard drive.


Ceph Configuration:
* Ceph Version 0.72.2
* Three Ceph hosts, CentOS 6.5 OS, using Xfs
* All connected via 10GbE network
* KVM/QEMU Virtualization, with Ceph support
* Virtual Machines are all RHEL 5.9 32bit
* Our Ceph setup is very basic. One pool for all VM disks, all drives on all 
Ceph hosts are in that pool.
* Ceph Caching is on:
rbd cache = true
rbd cache size = 128
rbd cache max dirty = 64
rbd cache target dirty = 64
rbd cache max dirty age = 10.0


Test:
Here I provide the test results of two VMs that are running on the same Ceph 
host, using disk images from the same ceph pool, and were cloned from the same 
RADOS snapshot. They both have the same exact KVM configuration. However, they 
report dramaticly different write speeds. When I tested them both, they were 
running on the same Ceph host. In fact, for the VM reporting slow disk write 
speed, I even had it run on a different Ceph host to test, and it still gave 
the same disk write speed results.

[root@linux]# rbd -p images info osimage01
rbd image 'osimage01':
size 28672 MB in 7168 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.2bfb74b0dc51
format: 2
features: layering
[root@linux]# rbd -p images info osimage02
rbd image 'osimage02':
size 28672 MB in 7168 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.2c1a2ae8944a
format: 2
features: layering

None of the images used are cloned.

[root@linux]# ssh image01host
image01host [65]% dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
512+0 records in

Re: [ceph-users] Backup Restore?

2014-04-02 Thread Craig Lewis
The short answer is no.  The longer answer is it depends.  The most 
concise discussion I've seen is Inktank's Multi-site option whitepaper: 
http://info.inktank.com/multisite_options_with_inktank_ceph_enterprise


That white paper only addresses RBD backups (using snapshots) and 
RadosGW backups (using RadosGW replication).  The first option in the 
whitepaper, a single cluster in multiple location, isn't a backup.


I'm not aware of any backup or offsite capability for raw RADOS pools.


There really aren't any good options for backing up CephFS.  You could 
use rsync on CephFS, but it's not going to work well.  rsync to offsite 
locations begins to have problems around the TB size, give or take an 
order of magnitude.  The exact spot depends on your bandwidth, latency, 
file count, average file size, average file churn, and Disk I/O on both 
sides.  It takes a lot of time and Disk I/O to enumerate all the files 
on the filesystem, and compare them to the offsite copy.  CephFS does 
have some nice features that could make for an efficient backup.  If 
rsync (or any backup client) was aware of the way CephFS handles 
directory size and timestamp, it could prune the directory tree 
enumeration much more efficiently.  That should scale well to much 
larger file systems, mostly limited by file churn and churn locality.  I 
don't know of anybody that's working on that.  I'm interested in the 
concept, but I have no plans (personal or professional) to use CephFS.



I'm currently working on adding Snapshot capabilities to RadosGW. 
Combined with replication, it can protect against disasters, PEBKAC, and 
application error.  Replication alone only protects against disasters, 
but not PEBKAC nor application errors.  Just like RAID protects against 
disk failure, but not file deletion.



Replication + Snapshots (for both RadosGW and RBD) don't protect against 
a determined attacker.  Even tape is vulnerable to a determined attacker 
with a high security level in your organization.  The trick with both 
offline backups and remote snapshots is to set up enough barriers and 
checks that things get noticed before a determined attacker can finish 
the job.  It's easier to do with offline backups than online backups.





*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website http://www.centraldesktop.com/  | Twitter 
http://www.twitter.com/centraldesktop  | Facebook 
http://www.facebook.com/CentralDesktop  | LinkedIn 
http://www.linkedin.com/groups?gid=147417  | Blog 
http://cdblog.centraldesktop.com/


On 4/2/14 00:08 , Robert Sander wrote:

Hi,

what are the options to consistently backup and restore
data out of a ceph cluster?

- RBDs can be snapshotted.
- Data on RBDs used inside VMs can be backed up using tools from the guest.
- CephFS data can be backed up using rsync are similar tools

What about object data in other pools?

There are two scenarios where a backup is needed:

- disaster recovery, i.e. the while cluster goes nuts
- single item restore, because PEBKAC or application error

Is there any work on progress to cover these?

Regards


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cancel a scrub?

2014-04-02 Thread Craig Lewis

Is there any way to cancel a scrub on a PG?


I have an OSD that's recovering, and there's a single PG left waiting:
2014-04-02 13:15:39.868994 mon.0 [INF] pgmap v5322756: 2592 pgs: 2589 
active+clean, 1 active+recovery_wait, 2 active+clean+scrubbing+deep; 
15066 GB data, 30527 GB used, 29061 GB / 59588 GB avail; 1/3878 
objects degraded (0.000%)


The PG that is in recovery_wait is on the same OSD that is being deep 
scrubbed.  I don't have journals on SSD, so recovery and scrubbing are 
heavily throttled.  I want to cancel the scrub so the recovery can 
complete.  I'll manually restart the deep scrub when it's done.


Normally I'd just wait, but this OSD is flapping.  It keeps getting 
kicked out of the cluster for being unresponsive.  I'm hoping that if I 
cancel the scrub, it will allow the recovery to complete and the OSD 
will stop flapping.






--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website http://www.centraldesktop.com/  | Twitter 
http://www.twitter.com/centraldesktop  | Facebook 
http://www.facebook.com/CentralDesktop  | LinkedIn 
http://www.linkedin.com/groups?gid=147417  | Blog 
http://cdblog.centraldesktop.com/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cancel a scrub?

2014-04-02 Thread Craig Lewis

Thanks!

I knew about noscrub, but I didn't realize that the flapping would 
cancel a scrub in progress.



So the scrub doesn't appear to be the reason it wasn't recovering.  
After a flap, it goes into:
2014-04-02 14:11:09.776810 mon.0 [INF] pgmap v5323181: 2592 pgs: 2591 
active+clean, 1 active+recovery_wait; 15066 GB data, 30527 GB used, 
29060 GB / 59588 GB avail; 1/3878 objects degraded (0.000%); 0 B/s, 
11 keys/s, 2 objects/s recovering


It stays in that state until the OSD gets kicked out again.


The problem is the flapping OSD is spamming its logs with:
2014-04-02 14:12:01.242425 7f344a97d700  1 heartbeat_map is_healthy 
'OSD::op_tp thread 0x7f3447977700' had timed out after 15


None of the other OSDs are saying that.

Is there anything I can do to repair the health map on osd.11?




In case it helps, here are the osd.11 logs after a daemon restart:
2014-04-02 14:10:58.267556 7f3467ff6780  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-osd, pid 7791
2014-04-02 14:10:58.269782 7f3467ff6780  1 
filestore(/var/lib/ceph/osd/ceph-11) mount detected xfs
2014-04-02 14:10:58.269789 7f3467ff6780  1 
filestore(/var/lib/ceph/osd/ceph-11)  disabling 'filestore replica 
fadvise' due to known issues with fadvise(DONTNEED) on xfs
2014-04-02 14:10:58.306112 7f3467ff6780  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features: 
FIEMAP ioctl is supported and appears to work
2014-04-02 14:10:58.306135 7f3467ff6780  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features: 
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2014-04-02 14:10:58.308070 7f3467ff6780  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features: 
syncfs(2) syscall fully supported (by glibc and kernel)
2014-04-02 14:10:58.357102 7f3467ff6780  0 
filestore(/var/lib/ceph/osd/ceph-11) mount: enabling WRITEAHEAD journal 
mode: checkpoint is not enabled
2014-04-02 14:10:58.360837 7f3467ff6780 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use 
of aio anyway
2014-04-02 14:10:58.360851 7f3467ff6780  1 journal _open 
/var/lib/ceph/osd/ceph-11/journal fd 20: 6442450944 bytes, block size 
4096 bytes, directio = 1, aio = 0
2014-04-02 14:10:58.422842 7f3467ff6780  1 journal _open 
/var/lib/ceph/osd/ceph-11/journal fd 20: 6442450944 bytes, block size 
4096 bytes, directio = 1, aio = 0
2014-04-02 14:10:58.423241 7f3467ff6780  1 journal close 
/var/lib/ceph/osd/ceph-11/journal
2014-04-02 14:10:58.424433 7f3467ff6780  1 
filestore(/var/lib/ceph/osd/ceph-11) mount detected xfs
2014-04-02 14:10:58.442963 7f3467ff6780  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features: 
FIEMAP ioctl is supported and appears to work
2014-04-02 14:10:58.442974 7f3467ff6780  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features: 
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2014-04-02 14:10:58.445144 7f3467ff6780  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-11) detect_features: 
syncfs(2) syscall fully supported (by glibc and kernel)
2014-04-02 14:10:58.451977 7f3467ff6780  0 
filestore(/var/lib/ceph/osd/ceph-11) mount: enabling WRITEAHEAD journal 
mode: checkpoint is not enabled
2014-04-02 14:10:58.454481 7f3467ff6780 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use 
of aio anyway
2014-04-02 14:10:58.454495 7f3467ff6780  1 journal _open 
/var/lib/ceph/osd/ceph-11/journal fd 21: 6442450944 bytes, block size 
4096 bytes, directio = 1, aio = 0
2014-04-02 14:10:58.465211 7f3467ff6780  1 journal _open 
/var/lib/ceph/osd/ceph-11/journal fd 21: 6442450944 bytes, block size 
4096 bytes, directio = 1, aio = 0
2014-04-02 14:10:58.466825 7f3467ff6780  0 cls 
cls/hello/cls_hello.cc:271: loading cls_hello
2014-04-02 14:10:58.468745 7f3467ff6780  0 osd.11 11688 crush map has 
features 1073741824, adjusting msgr requires for clients
2014-04-02 14:10:58.468756 7f3467ff6780  0 osd.11 11688 crush map has 
features 1073741824, adjusting msgr requires for osds
2014-04-02 14:11:07.822045 7f343de58700  0 -- 10.194.0.7:6800/7791  
10.194.0.7:6822/14075 pipe(0x1c96e000 sd=177 :6800 s=0 pgs=0 cs=0 l=0 
c=0x1b7e3000).accept connect_seq 0 vs existing 0 state connecting
2014-04-02 14:11:07.822182 7f343f973700  0 -- 10.194.0.7:6800/7791  
10.194.0.7:6806/26942 pipe(0x1c96e280 sd=82 :6800 s=0 pgs=0 cs=0 l=0 
c=0x1b7e3160).accept connect_seq 0 vs existing 0 state connecting
2014-04-02 14:11:20.333163 7f344a97d700  1 heartbeat_map is_healthy 
'OSD::op_tp thread 0x7f3447977700' had timed out after 15

snip repeats
2014-04-02 14:13:35.310407 7f344a97d700 -1 common/HeartbeatMap.cc: In 
function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, 
const char*, time_t)' thread 7f344a97d700 time 2014-04-02 14:13:35.308718

common/HeartbeatMap.cc: 79: FAILED assert(0 == hit suicide timeout)

 ceph version 0.72.2 

[ceph-users] Cleaning up; data usage, snap-shots, auth users

2014-04-02 Thread Jonathan Gowar
Hi,

   I have a small 8TB testing cluster.  During testing I've used 94G.
But, I have since removed pools and images from Ceph, I shouldn't be
using any space, but still the 94G usage remains.  How can I reclaim old
used space?

Also, this:-

ceph@ceph-admin:~$ rbd rm 6fa36869-4afe-485a-90a3-93fba1b5d15e
2014-04-03 01:02:23.304323 7f92e2ced760 -1 librbd::ImageCtx: error
finding header: (2) No such file or directory
Removing image: 2014-04-03 01:02:23.312212 7f92e2ced760 -1 librbd: error
removing img from new-style directory: (2) No such file or directory
0% complete...failed.
rbd: delete error: (2) No such file or directory
ceph@ceph-admin:~$ rbd rm 6fa36869-4afe-485a-90a3-93fba1b5d15e -p
cloudstack
2014-04-03 01:02:34.424626 7fd556d00760 -1 librbd: image has snapshots -
not removing
Removing image: 0% complete...failed.
rbd: image has snapshots - these must be deleted with 'rbd snap purge'
before the image can be removed.
ceph@ceph-admin:~$ rbd snap purge 6fa36869-4afe-485a-90a3-93fba1b5d15e
-p cloudstack
Removing all snapshots2014-04-03 01:02:46.863370 7f2949461760 -1 librbd:
removing snapshot from header failed: (16) Device or resource busy
: 0% complete...failed.
rbd: removing snaps failed: (16) Device or resource busy

Lastly, how can I remove a user from the auth list?

Regards,
Jon

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] heartbeat_map is_healthy had timed out after 15

2014-04-02 Thread Craig Lewis

I'm seeing one OSD spamming it's log with
2014-04-02 16:49:21.547339 7f5cc6c5d700  1 heartbeat_map is_healthy 
'OSD::op_tp thread 0x7f5cc3456700' had timed out after 15


It starts about 30 seconds after the OSD daemon is started.  It 
continues until
2014-04-02 16:48:57.526925 7f0e5a683700  1 heartbeat_map is_healthy 
'OSD::op_tp thread 0x7f0e3c857700' had suicide timed out after 150
2014-04-02 16:48:57.528008 7f0e5a683700 -1 common/HeartbeatMap.cc: In 
function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, 
const char*, time_t)' thread 7f0e5a683700 time 2014-04-02 16:48:57.526948

common/HeartbeatMap.cc: 79: FAILED assert(0 == hit suicide timeout)

I tried bumping up logging, and I don't see anything interesting.  I 
tried strace, and all I can really see is that the OSD spends a lot of 
time in FUTEX_WAIT.


This OSD has been flapping for several days now.  None of the other OSDs 
are having this issue.
I thought it might be similiar to Quenten Grasso's post about 'OSD 
Restarts cause excessively high load average and requests are blocked  
32 sec'. At first it looks similiar, but Quenten said his OSDs 
eventually settle down. Mine never does.




Can I increase that 15 second timeout, to see if it just needs 
additional time?  I don't see anything in the ceph docs about this.


Otherwise, I'm pretty close to removing the disk, zapping it, and add it 
back to the cluster.  Any other suggestions?


--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website http://www.centraldesktop.com/  | Twitter 
http://www.twitter.com/centraldesktop  | Facebook 
http://www.facebook.com/CentralDesktop  | LinkedIn 
http://www.linkedin.com/groups?gid=147417  | Blog 
http://cdblog.centraldesktop.com/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Backup Restore?

2014-04-02 Thread Robert Sander
Hi,

what are the options to consistently backup and restore
data out of a ceph cluster?

- RBDs can be snapshotted.
- Data on RBDs used inside VMs can be backed up using tools from the guest.
- CephFS data can be backed up using rsync are similar tools

What about object data in other pools?

There are two scenarios where a backup is needed:

- disaster recovery, i.e. the while cluster goes nuts
- single item restore, because PEBKAC or application error

Is there any work on progress to cover these?

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG: 
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd map error - numerical result out of range

2014-04-02 Thread Tom

Hi again Ilya,

No, no snapshots in this case.  It's a brand new RBD that I've created.

Cheers.  Tom.

On 01/04/14 16:08, Ilya Dryomov wrote:

On Tue, Apr 1, 2014 at 6:55 PM, Tom t...@t0mb.net wrote:

Thanks for the reply.

Ceph is version 0.73-1precise, and the kernel release is
3.11.9-031109-generic.

also rbd showmapped shows 16 lines of output.

Are there snapshots involved?

Thanks,

 Ilya


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Backup Restore?

2014-04-02 Thread Karan Singh
Hi Robert

Thanks for raising this question , backup and restores options has always been 
interesting to discuss. i too have a connected question for Inktank.

— Is there any work going for support of ceph cluster getting backed by 
enterprise *proprietary* backup solutions available today 



Karan Singh 
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/


On 02 Apr 2014, at 10:08, Robert Sander r.san...@heinlein-support.de wrote:

 Hi,
 
 what are the options to consistently backup and restore
 data out of a ceph cluster?
 
 - RBDs can be snapshotted.
 - Data on RBDs used inside VMs can be backed up using tools from the guest.
 - CephFS data can be backed up using rsync are similar tools
 
 What about object data in other pools?
 
 There are two scenarios where a backup is needed:
 
 - disaster recovery, i.e. the while cluster goes nuts
 - single item restore, because PEBKAC or application error
 
 Is there any work on progress to cover these?
 
 Regards
 -- 
 Robert Sander
 Heinlein Support GmbH
 Schwedter Str. 8/9b, 10119 Berlin
 
 http://www.heinlein-support.de
 
 Tel: 030 / 405051-43
 Fax: 030 / 405051-19
 
 Zwangsangaben lt. §35a GmbHG: 
 HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
 Geschäftsführer: Peer Heinlein -- Sitz: Berlin
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OpenStack + Ceph Integration

2014-04-02 Thread Tomokazu HIRAI
I Integrated Ceph + OpenStack with following document.

https://ceph.com/docs/master/rbd/rbd-openstack/

I could put image to glance on ceph cluster. but I can not create any
volume to cinder.

error messages are the same on this URL.

http://comments.gmane.org/gmane.comp.file-systems.ceph.user/7641

---

2014-04-02 17:31:57.799 22321 ERROR cinder.volume.drivers.rbd
[req-b18d0e8d-c818-4fb4-9dd8-dbdd938f919b None None] error connecting to
ceph cluster
2014-04-02 17:31:57.799 22321 TRACE cinder.volume.drivers.rbd Traceback
(most recent call last):
2014-04-02 17:31:57.799 22321 TRACE cinder.volume.drivers.rbd   File
/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py, line 262,
in check_for_setup_error
2014-04-02 17:31:57.799 22321 TRACE cinder.volume.drivers.rbd with
RADOSClient(self):
2014-04-02 17:31:57.799 22321 TRACE cinder.volume.drivers.rbd   File
/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py, line 234,
in __init__
2014-04-02 17:31:57.799 22321 TRACE cinder.volume.drivers.rbd
self.cluster, self.ioctx = driver._connect_to_rados(pool)
2014-04-02 17:31:57.799 22321 TRACE cinder.volume.drivers.rbd   File
/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py, line 282,
in _connect_to_rados
2014-04-02 17:31:57.799 22321 TRACE cinder.volume.drivers.rbd
client.connect()
2014-04-02 17:31:57.799 22321 TRACE cinder.volume.drivers.rbd   File
/usr/lib/python2.7/dist-packages/rados.py, line 408, in connect
2014-04-02 17:31:57.799 22321 TRACE cinder.volume.drivers.rbd raise
make_ex(ret, error calling connect)
2014-04-02 17:31:57.799 22321 TRACE cinder.volume.drivers.rbd
ObjectNotFound: error calling connect
2014-04-02 17:31:57.799 22321 TRACE cinder.volume.drivers.rbd
2014-04-02 17:31:57.800 22321 ERROR cinder.volume.manager
[req-b18d0e8d-c818-4fb4-9dd8-dbdd938f919b None None] Error encountered
during initialization of driver: RBDDriver
2014-04-02 17:31:57.801 22321 ERROR cinder.volume.manager
[req-b18d0e8d-c818-4fb4-9dd8-dbdd938f919b None None] Bad or unexpected
response from the storage volume backend API: error connecting to ceph
cluster
2014-04-02 17:31:57.801 22321 TRACE cinder.volume.manager Traceback (most
recent call last):
2014-04-02 17:31:57.801 22321 TRACE cinder.volume.manager   File
/usr/lib/python2.7/dist-packages/cinder/volume/manager.py, line 190, in
init_host
2014-04-02 17:31:57.801 22321 TRACE cinder.volume.manager
self.driver.check_for_setup_error()
2014-04-02 17:31:57.801 22321 TRACE cinder.volume.manager   File
/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py, line 267,
in check_for_setup_error
2014-04-02 17:31:57.801 22321 TRACE cinder.volume.manager raise
exception.VolumeBackendAPIException(data=msg)
2014-04-02 17:31:57.801 22321 TRACE cinder.volume.manager
VolumeBackendAPIException: Bad or unexpected response from the storage
volume backend API: error connecting to ceph cluster

so I added these lines to /etc/ceph/ceph.conf

[client.cinder]
key = key_id

but I could not create any volumes to cinder.

Does anyone have an idea ?

Thanks from cloudy Tokyo.

-- Tomokazu HIRAI (@jedipunkz)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw multipart-uploaded downloads fail

2014-04-02 Thread Benedikt Fraunhofer
Hi Yehuda,

i tried your patch and it feels fine,
except you might need some special handling for those already corrupt uploads,
as trying to delete them gets radosgw in an endless loop and high cpu usage:

2014-04-02 11:03:15.045627 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045628 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045629 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045631 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045632 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045634 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045634 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045636 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045637 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045639 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045639 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045641 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045642 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045644 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045644 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045646 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045647 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045649 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045649 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045651 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045652 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045654 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045654 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045656 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045657 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045659 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045660 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045661 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045662 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045664 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045665 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045667 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045667 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045669 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1
2014-04-02 11:03:15.045670 7fbf157d2700  0
RGWObjManifest::operator++(): result: ofs=33554432 stripe_ofs=33554432
part_ofs=33554432 rule-part_size=0
2014-04-02 11:03:15.045672 7fbf157d2700 20
RGWObjManifest::operator++(): rule-part_size=0 rules.size()=1


Thx

 Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

2014-04-02 Thread Kenneth Waegeman


- Message from Gregory Farnum g...@inktank.com -
   Date: Tue, 1 Apr 2014 09:03:17 -0700
   From: Gregory Farnum g...@inktank.com
Subject: Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)
 To: Yan, Zheng uker...@gmail.com
 Cc: Kenneth Waegeman kenneth.waege...@ugent.be, ceph-users  
ceph-users@lists.ceph.com




On Tue, Apr 1, 2014 at 7:12 AM, Yan, Zheng uker...@gmail.com wrote:

On Tue, Apr 1, 2014 at 10:02 PM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:
After some more searching, I've found that the source of the  
problem is with

the mds and not the mon.. The mds crashes, generates a core dump that eats
the local space, and in turn the monitor (because of leveldb) crashes.

The error in the mds log of one host:

2014-04-01 15:46:34.414615 7f870e319700  0 -- 10.141.8.180:6836/13152 
10.141.8.180:6789/0 pipe(0x517371180 sd=54 :42439 s=4 pgs=0 cs=0 l=1
c=0x147ac780).connect got RESETSESSION but no longer connecting
2014-04-01 15:46:34.438792 7f871194f700  0 -- 10.141.8.180:6836/13152 
10.141.8.180:6789/0 pipe(0x1b099f580 sd=8 :43150 s=4 pgs=0 cs=0 l=1
c=0x1fd44360).connect got RESETSESSION but no longer connecting
2014-04-01 15:46:34.439028 7f870e319700  0 -- 10.141.8.180:6836/13152 
10.141.8.182:6789/0 pipe(0x13aa64880 sd=54 :37085 s=4 pgs=0 cs=0 l=1
c=0x1fd43de0).connect got RESETSESSION but no longer connecting
2014-04-01 15:46:34.468257 7f871b7ae700 -1 mds/CDir.cc: In function 'void
CDir::_omap_fetched(ceph::bufferlist, std::mapstd::basic_stringchar,
std::char_traitschar, std::allocatorchar , ceph::buffer::list,
std::lessstd::basic_stringchar, std::char_traitschar,
std::allocatorchar  , std::allocatorstd::pairconst
std::basic_stringchar, std::char_traitschar, std::allocatorchar ,
ceph::buffer::list  , const std::string, int)' thread  
7f871b7ae700 time

2014-04-01 15:46:34.448320
mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)



could you use gdb to check what is value of variable 'r' .


If you look at the crash dump log you can see the return value in the
osd_op_reply message:
-1 2014-04-01 15:46:34.440860 7f871b7ae700  1 --
10.141.8.180:6836/13152 == osd.3 10.141.8.180:6827/4366 33077 
osd_op_reply(4179177 11f2ef1. [omap-get-header
0~0,omap-get-vals 0~16] v0'0 uv0 ack = -108 (Cannot send after
transport endpoint shutdown)) v6  229+0+0 (958358678 0 0)
0x2cff7aa80 con 0x37ea3c0

-108, which is ESHUTDOWN, but we also use it (via the 108 constant, I
think because ESHUTDOWN varies across platforms) as EBLACKLISTED.
So it looks like this is itself actually a symptom of another problem
that is causing the MDS to get timed out on the monitor. If a core
dump is eating the local space, maybe the MDS is stuck in an
infinite allocation loop of some kind? How big are your disks,
Kenneth? Do you have any information on how much CPU/memory the MDS
was using before this?


I monitored the mds process after restart:

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
19215 root  20   0 6070m 5.7g 5236 S 778.6 18.1   1:27.54 ceph-mds
19215 root  20   0 7926m 7.5g 5236 S 179.2 23.8   2:44.39 ceph-mds
19215 root  20   0 12.4g  12g 5236 S 157.2 38.8   3:43.47 ceph-mds
19215 root  20   0 16.6g  16g 5236 S 144.4 52.0   4:15.01 ceph-mds
19215 root  20   0 19.9g  19g 5236 S 137.2 62.5   4:35.83 ceph-mds
19215 root  20   0 24.5g  24g 5224 S 136.5 77.0   5:04.66 ceph-mds
19215 root  20   0 25.8g  25g 2944 S 33.7 81.2   5:13.74 ceph-mds
19215 root  20   0 26.0g  25g 2916 S 24.6 81.7   5:19.07 ceph-mds
19215 root  20   0 26.1g  25g 2916 S 13.0 82.1   5:22.16 ceph-mds
19215 root  20   0 27.7g  26g 1856 S 100.0 85.8   5:36.46 ceph-mds

Then it crashes. I changed the core dump location out of the root fs,  
the core dump is indeed about 26G


My disks:

Filesystem  Size  Used Avail Use% Mounted on
/dev/sda2   9.9G  2.9G  6.5G  31% /
tmpfs16G 0   16G   0% /dev/shm
/dev/sda1   248M   53M  183M  23% /boot
/dev/sda4   172G   61G  112G  35% /var/lib/ceph/log/sda4
/dev/sdb187G   61G  127G  33% /var/lib/ceph/log/sdb
/dev/sdc3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdc
/dev/sdd3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdd
/dev/sde3.7T  1.4T  2.4T  37% /var/lib/ceph/osd/sde
/dev/sdf3.7T  1.5T  2.3T  39% /var/lib/ceph/osd/sdf
/dev/sdg3.7T  2.1T  1.7T  56% /var/lib/ceph/osd/sdg
/dev/sdh3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdh
/dev/sdi3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdi
/dev/sdj3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdj
/dev/sdk3.7T  2.1T  1.6T  58% /var/lib/ceph/osd/sdk
/dev/sdl3.7T  1.7T  2.0T  46% /var/lib/ceph/osd/sdl
/dev/sdm3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdm
/dev/sdn3.7T  1.4T  2.3T  38% /var/lib/ceph/osd/sdn




-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com



- End message from Gregory Farnum g...@inktank.com -


--

Met 

Re: [ceph-users] Setting root directory in fstab with Fuse

2014-04-02 Thread Gregory Farnum
It's been a while, but I think you need to use the long form
client_mountpoint config option here instead. If you search the list
archives it'll probably turn up; this is basically the only reason we
ever discuss -r. ;)
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Apr 2, 2014 at 5:10 AM, Florent B flor...@coppint.com wrote:
 Hi all,

 I am trying to set a fuse.ceph mount on a Debian 7 (kernel 3.2).

 I use Ceph Emperor version.

 How can I set a root directory in fstab using fuse.ceph ??

 I do :

 id=mail01,conf=/etc/ceph/ceph.conf,r=/fs1-mail1 /fs1-mail1 fuse.ceph noatime
 0 0


 But I get this error :

 ceph-fuse[23794]: starting ceph client
 fuse: unknown option `--r=/fs1-mail1'
 ceph-fuse[23794]: fuse failed to initialize
 2014-04-02 14:04:23.132664 7f7e4fd91760 -1 fuse_lowlevel_new failed
 ceph-fuse[23785]: mount failed: (33) Numerical argument out of domain


 Whereas when I do :

 ceph-fuse -d --id mail01 -m mon.mycompany.net -r /fs1-mail1 /fs1-mail1

 It works fine.

 How can I do that ?

 My configuration file only contains monitor address. Passing it as an option
 could be nice.

 Thank you a lot


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

2014-04-02 Thread Stijn De Weirdt

hi gregory,

(i'm a colleague of kenneth)


1) How big and what shape the filesystem is. Do you have some
extremely large directory that the MDS keeps trying to load and then
dump?
anyway to extract this from the mds without having to start it? as it 
was an rsync operation, i can try to locate possible candidates on the 
source filesystem, but what would be considered large?



2) Use tcmalloc's heap analyzer to see where all the memory is being allocated.

we'll giv ethat a try


3) Look through the logs for when the beacon fails (the first of
mds.0.16 is_laggy 600.641332  15 since last acked beacon) and see
if there's anything tell-tale going on at the time.


anything in particular we should be looking for?

the log goes as follows:
mds starts around 11:43
...


2014-04-01 11:44:23.658583 7ffec89c6700  1 mds.0.server reconnect_clients -- 1 
sessions
2014-04-01 11:44:41.212488 7ffec89c6700  0 log [DBG] : reconnect by client.4585 
10.141.8.199:0/3551 after 17.553854
2014-04-01 11:44:45.692237 7ffec89c6700  1 mds.0.10 reconnect_done
2014-04-01 11:44:45.996384 7ffec89c6700  1 mds.0.10 handle_mds_map i am now 
mds.0.10
2014-04-01 11:44:45.996388 7ffec89c6700  1 mds.0.10 handle_mds_map state change 
up:reconnect -- up:rejoin
2014-04-01 11:44:45.996390 7ffec89c6700  1 mds.0.10 rejoin_start
2014-04-01 11:49:53.158471 7ffec89c6700  1 mds.0.10 rejoin_joint_start


then lots (4667 lines) of

2014-04-01 11:50:10.237035 7ffebc844700  0 -- 10.141.8.180:6837/55117  
10.141.8.180:6789/0 pipe(0x38a7da00 sd=104 :41115 s=4 pgs=0 cs=0 l=1 
c=0x6513e8840).connect got RESETSESSION but no longer connecting


with one intermediate

2014-04-01 11:51:50.181354 7ffebcf4b700  0 -- 10.141.8.180:6837/55117  
10.141.8.180:6789/0 pipe(0x10e282580 sd=103 :0 s=1 pgs=0 cs=0 l=1 c=0xc77d5ee0).fault



then sudden change

2014-04-01 11:57:30.440554 7ffebcd49700  0 -- 10.141.8.180:6837/55117  
10.141.8.182:6789/0 pipe(0xa1534100 sd=104 :48176 s=4 pgs=0 cs=0 l=1 
c=0xd99b11e0).connect got RESETSESSION but no longer connecting
2014-04-01 11:57:30.722607 7ffebec68700  0 -- 10.141.8.180:6837/55117  
10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 cs=0 l=1 
c=0xc48a3f40).connect got BADAUTHORIZER
2014-04-01 11:57:30.722669 7ffebec68700  0 -- 10.141.8.180:6837/55117  
10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 cs=0 l=1 
c=0xc48a3f40).connect got BADAUTHORIZER
2014-04-01 11:57:30.722885 7ffebec68700  0 -- 10.141.8.180:6837/55117  
10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 cs=0 l=1 
c=0xc48a3f40).connect got BADAUTHORIZER
2014-04-01 11:57:30.722945 7ffebec68700  0 -- 10.141.8.180:6837/55117  
10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 cs=0 l=1 
c=0xc48a3f40).connect got BADAUTHORIZER


followed by lots of

2014-04-01 11:57:30.738562 7ffebec68700  0 -- 10.141.8.180:6837/55117  
10.141.8.181:6789/0 pipe(0xead9fa80 sd=57 :0 s=1 pgs=0 cs=0 l=1 c=0x10e5d280).fault


with sporadic

2014-04-01 11:57:32.431219 7ffebeb67700  0 -- 10.141.8.180:6837/55117  
10.141.8.182:6789/0 pipe(0xef85cd80 sd=103 :41218 s=4 pgs=0 cs=0 l=1 
c=0x130590dc0).connect got RESETSESSION but no longer connecting



until the dmup

2014-04-01 11:59:27.612850 7ffebea66700  0 -- 10.141.8.180:6837/55117  
10.141.8.181:6789/0 pipe(0xe3036400 sd=103 :0 s=1 pgs=0 cs=0 l=1 c=0xa7be300).fault
2014-04-01 11:59:27.639009 7ffec89c6700 -1 mds/CDir.cc: In function 'void 
CDir::_omap_fetched(ceph::bufferlist, std::mapstd::basic_stringchar, 
std::char_traitschar, std::allocator\
char , ceph::buffer::list, std::lessstd::basic_stringchar, std::char_traitschar, 
std::allocatorchar  , std::allocatorstd::pairconst std::basic_stringchar, std::char_trait\
schar, std::allocatorchar , ceph::buffer::list  , const std::string, 
int)' thread 7ffec89c6700 time 2014-04-01 11:59:27.620684
mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)

 ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)
 1: (CDir::_omap_fetched(ceph::buffer::list, std::mapstd::string, ceph::buffer::list, 
std::lessstd::string, std::allocatorstd::pairstd::string const, ceph::buffer::list  
, st\
d::string const, int)+0x4d71) [0x77c3c1]
 2: (Context::complete(int)+0x9) [0x56bb79]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11a6) [0x806dd6]
 4: (MDS::handle_core_message(Message*)+0x9c7) [0x5901d7]
 5: (MDS::_dispatch(Message*)+0x2f) [0x59028f]
 6: (MDS::ms_dispatch(Message*)+0x1ab) [0x591d4b]
 7: (DispatchQueue::entry()+0x582) [0x902072]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x85ef4d]
 9: /lib64/libpthread.so.0() [0x34c36079d1]
 10: (clone()+0x6d) [0x34c32e8b6d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

--- begin dump of recent events ---
-1 2014-04-01 11:59:27.137779 7ffec89c6700  5 mds.0.10 initiating monitor 
reconnect; maybe we're not the slow one
 - 2014-04-01 11:59:27.137787 7ffec89c6700 10 monclient(hunting): 
_reopen_session rank -1 name
 -9998 2014-04-01 11:59:27.137790 

Re: [ceph-users] cephx key for CephFS access only

2014-04-02 Thread Travis Rhoden
Thanks for the response Greg.

Unfortunately, I appear to be missing something.  If I use my cephfs key
with these perms:

client.cephfs
key: redacted
caps: [mds] allow rwx
caps: [mon] allow r
caps: [osd] allow rwx pool=data

This is what happens when I mount:

# ceph-fuse -k /etc/ceph/ceph.client.cephfs.keyring -m ceph0-10g /data
ceph-fuse[13533]: starting ceph client
ceph-fuse[13533]: ceph mount failed with (1) Operation not permitted
ceph-fuse[13531]: mount failed: (1) Operation not permitted

But using the admin key works just fine:

# ceph-fuse -k /etc/ceph/ceph.client.admin.keyring -m ceph0-10g /data
ceph-fuse[13548]: starting ceph client
ceph-fuse[13548]: starting fuse

The admin key as the following perms:

client.admin
key: redacted
caps: [mds] allow
caps: [mon] allow *
caps: [osd] allow *

Since the mds permissions are functionally equivalent, either I need extra
rights on the monitor, or the OSDs.  Does a client need to access the
metadata pool in order to do a CephFS mount?

I'll experiment a bit and report back.


On Mon, Mar 31, 2014 at 1:36 PM, Gregory Farnum g...@inktank.com wrote:

 At present, the only security permission on the MDS is allowed to do
 stuff, so rwx and * are synonymous. In general * means is an
 admin, though, so you'll be happier in the future if you use rwx.
 You may also want a more restrictive set of monitor capabilities as
 somebody else recently pointed out, but [3] will give you the
 filesystem access you're looking for.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Fri, Mar 28, 2014 at 9:40 AM, Travis Rhoden trho...@gmail.com wrote:
  Hi Folks,
 
  What would be the right set of capabilities to set for a new client key
 that
  has access to CephFS only?  I've seen a few different examples:
 
  [1] mds 'allow *' mon 'allow r' osd 'allow rwx pool=data'
  [2] mon 'allow r' osd 'allow rwx pool=data'
  [3] mds 'allow rwx' mon 'allow r' osd 'allow rwx pool=data'
 
  I'm inclined to go with [3]. [1] seems weird for using *, I like seeing
 rwx.
  Are these synonymous? [2] seems wrong because it doesn't include anything
  for MDS.
 
  - Travis
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephx key for CephFS access only

2014-04-02 Thread Gregory Farnum
Hrm, I don't remember. Let me know which permutation works and we can
dig into it.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Apr 2, 2014 at 9:00 AM, Travis Rhoden trho...@gmail.com wrote:
 Thanks for the response Greg.

 Unfortunately, I appear to be missing something.  If I use my cephfs key
 with these perms:

 client.cephfs
 key: redacted
 caps: [mds] allow rwx
 caps: [mon] allow r
 caps: [osd] allow rwx pool=data

 This is what happens when I mount:

 # ceph-fuse -k /etc/ceph/ceph.client.cephfs.keyring -m ceph0-10g /data
 ceph-fuse[13533]: starting ceph client
 ceph-fuse[13533]: ceph mount failed with (1) Operation not permitted
 ceph-fuse[13531]: mount failed: (1) Operation not permitted

 But using the admin key works just fine:

 # ceph-fuse -k /etc/ceph/ceph.client.admin.keyring -m ceph0-10g /data
 ceph-fuse[13548]: starting ceph client
 ceph-fuse[13548]: starting fuse

 The admin key as the following perms:

 client.admin
 key: redacted
 caps: [mds] allow
 caps: [mon] allow *
 caps: [osd] allow *

 Since the mds permissions are functionally equivalent, either I need extra
 rights on the monitor, or the OSDs.  Does a client need to access the
 metadata pool in order to do a CephFS mount?

 I'll experiment a bit and report back.


 On Mon, Mar 31, 2014 at 1:36 PM, Gregory Farnum g...@inktank.com wrote:

 At present, the only security permission on the MDS is allowed to do
 stuff, so rwx and * are synonymous. In general * means is an
 admin, though, so you'll be happier in the future if you use rwx.
 You may also want a more restrictive set of monitor capabilities as
 somebody else recently pointed out, but [3] will give you the
 filesystem access you're looking for.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Fri, Mar 28, 2014 at 9:40 AM, Travis Rhoden trho...@gmail.com wrote:
  Hi Folks,
 
  What would be the right set of capabilities to set for a new client key
  that
  has access to CephFS only?  I've seen a few different examples:
 
  [1] mds 'allow *' mon 'allow r' osd 'allow rwx pool=data'
  [2] mon 'allow r' osd 'allow rwx pool=data'
  [3] mds 'allow rwx' mon 'allow r' osd 'allow rwx pool=data'
 
  I'm inclined to go with [3]. [1] seems weird for using *, I like seeing
  rwx.
  Are these synonymous? [2] seems wrong because it doesn't include
  anything
  for MDS.
 
  - Travis
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephx key for CephFS access only

2014-04-02 Thread Travis Rhoden
Ah, I figured it out.  My original key worked, but I needed to use the --id
option with ceph-fuse to tell it to use the cephfs user rather than the
admin user.  Tailing the log on my monitor pointed out that it was logging
in with client.admin, but providing the key for client.cephfs.

So, final working command is:

ceph-fuse -k /etc/ceph/ceph.client.cephfs.keyring --id cephfs -m ceph0-10g
/data

I will note that neither the -k or --id options are present in man
ceph-fuse, ceph-fuse --help, or in the Ceph docs, really.  An example
using -k is found here:
http://ceph.com/docs/master/start/quick-cephfs/#filesystem-in-user-space-fuse,
but there is never any mention of needing to change users if you are not
using client.admin.  In fact, using the search functionality on ceph-fuse
returns zero results.

If I'm ambitious I'll submit changes for the docs...

Thanks for the help!

 - Travis


On Wed, Apr 2, 2014 at 12:00 PM, Travis Rhoden trho...@gmail.com wrote:

 Thanks for the response Greg.

 Unfortunately, I appear to be missing something.  If I use my cephfs key
 with these perms:

 client.cephfs
 key: redacted
 caps: [mds] allow rwx
 caps: [mon] allow r
 caps: [osd] allow rwx pool=data

 This is what happens when I mount:

 # ceph-fuse -k /etc/ceph/ceph.client.cephfs.keyring -m ceph0-10g /data
 ceph-fuse[13533]: starting ceph client
 ceph-fuse[13533]: ceph mount failed with (1) Operation not permitted
 ceph-fuse[13531]: mount failed: (1) Operation not permitted

 But using the admin key works just fine:

 # ceph-fuse -k /etc/ceph/ceph.client.admin.keyring -m ceph0-10g /data
 ceph-fuse[13548]: starting ceph client
 ceph-fuse[13548]: starting fuse

 The admin key as the following perms:

 client.admin
 key: redacted
 caps: [mds] allow
 caps: [mon] allow *
 caps: [osd] allow *

 Since the mds permissions are functionally equivalent, either I need extra
 rights on the monitor, or the OSDs.  Does a client need to access the
 metadata pool in order to do a CephFS mount?

 I'll experiment a bit and report back.


 On Mon, Mar 31, 2014 at 1:36 PM, Gregory Farnum g...@inktank.com wrote:

 At present, the only security permission on the MDS is allowed to do
 stuff, so rwx and * are synonymous. In general * means is an
 admin, though, so you'll be happier in the future if you use rwx.
 You may also want a more restrictive set of monitor capabilities as
 somebody else recently pointed out, but [3] will give you the
 filesystem access you're looking for.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Fri, Mar 28, 2014 at 9:40 AM, Travis Rhoden trho...@gmail.com wrote:
  Hi Folks,
 
  What would be the right set of capabilities to set for a new client key
 that
  has access to CephFS only?  I've seen a few different examples:
 
  [1] mds 'allow *' mon 'allow r' osd 'allow rwx pool=data'
  [2] mon 'allow r' osd 'allow rwx pool=data'
  [3] mds 'allow rwx' mon 'allow r' osd 'allow rwx pool=data'
 
  I'm inclined to go with [3]. [1] seems weird for using *, I like seeing
 rwx.
  Are these synonymous? [2] seems wrong because it doesn't include
 anything
  for MDS.
 
  - Travis
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

2014-04-02 Thread Stijn De Weirdt

hi,


1) How big and what shape the filesystem is. Do you have some
extremely large directory that the MDS keeps trying to load and then
dump?

anyway to extract this from the mds without having to start it? as it
was an rsync operation, i can try to locate possible candidates on the
source filesystem, but what would be considered large?
total number of files 13M, spread over 800k directories, but it's 
unclear how far the sync was at time of failing. i've not found a good 
way to for directories with lots of files and/or subdirs.





2) Use tcmalloc's heap analyzer to see where all the memory is being
allocated.

we'll giv ethat a try
i run ceph-mds with HEAPCHECK=normal (via the init script), but how can 
we stop mds without killing it? the heapchecker only seems to dump at 
the end of a run, maybe there's a way to have intermediate dump like 
valgrind, but the documentation is not very helpful.


stijn




3) Look through the logs for when the beacon fails (the first of
mds.0.16 is_laggy 600.641332  15 since last acked beacon) and see
if there's anything tell-tale going on at the time.


anything in particular we should be looking for?

the log goes as follows:
mds starts around 11:43
...


2014-04-01 11:44:23.658583 7ffec89c6700  1 mds.0.server
reconnect_clients -- 1 sessions
2014-04-01 11:44:41.212488 7ffec89c6700  0 log [DBG] : reconnect by
client.4585 10.141.8.199:0/3551 after 17.553854
2014-04-01 11:44:45.692237 7ffec89c6700  1 mds.0.10 reconnect_done
2014-04-01 11:44:45.996384 7ffec89c6700  1 mds.0.10 handle_mds_map i
am now mds.0.10
2014-04-01 11:44:45.996388 7ffec89c6700  1 mds.0.10 handle_mds_map
state change up:reconnect -- up:rejoin
2014-04-01 11:44:45.996390 7ffec89c6700  1 mds.0.10 rejoin_start
2014-04-01 11:49:53.158471 7ffec89c6700  1 mds.0.10 rejoin_joint_start


then lots (4667 lines) of

2014-04-01 11:50:10.237035 7ffebc844700  0 -- 10.141.8.180:6837/55117
 10.141.8.180:6789/0 pipe(0x38a7da00 sd=104 :41115 s=4 pgs=0 cs=0
l=1 c=0x6513e8840).connect got RESETSESSION but no longer connecting


with one intermediate

2014-04-01 11:51:50.181354 7ffebcf4b700  0 -- 10.141.8.180:6837/55117
 10.141.8.180:6789/0 pipe(0x10e282580 sd=103 :0 s=1 pgs=0 cs=0 l=1
c=0xc77d5ee0).fault



then sudden change

2014-04-01 11:57:30.440554 7ffebcd49700  0 -- 10.141.8.180:6837/55117
 10.141.8.182:6789/0 pipe(0xa1534100 sd=104 :48176 s=4 pgs=0 cs=0
l=1 c=0xd99b11e0).connect got RESETSESSION but no longer connecting
2014-04-01 11:57:30.722607 7ffebec68700  0 -- 10.141.8.180:6837/55117
 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 cs=0
l=1 c=0xc48a3f40).connect got BADAUTHORIZER
2014-04-01 11:57:30.722669 7ffebec68700  0 -- 10.141.8.180:6837/55117
 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 cs=0
l=1 c=0xc48a3f40).connect got BADAUTHORIZER
2014-04-01 11:57:30.722885 7ffebec68700  0 -- 10.141.8.180:6837/55117
 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 cs=0 l=1
c=0xc48a3f40).connect got BADAUTHORIZER
2014-04-01 11:57:30.722945 7ffebec68700  0 -- 10.141.8.180:6837/55117
 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 cs=0 l=1
c=0xc48a3f40).connect got BADAUTHORIZER


followed by lots of

2014-04-01 11:57:30.738562 7ffebec68700  0 -- 10.141.8.180:6837/55117
 10.141.8.181:6789/0 pipe(0xead9fa80 sd=57 :0 s=1 pgs=0 cs=0 l=1
c=0x10e5d280).fault


with sporadic

2014-04-01 11:57:32.431219 7ffebeb67700  0 -- 10.141.8.180:6837/55117
 10.141.8.182:6789/0 pipe(0xef85cd80 sd=103 :41218 s=4 pgs=0 cs=0
l=1 c=0x130590dc0).connect got RESETSESSION but no longer connecting



until the dmup

2014-04-01 11:59:27.612850 7ffebea66700  0 -- 10.141.8.180:6837/55117
 10.141.8.181:6789/0 pipe(0xe3036400 sd=103 :0 s=1 pgs=0 cs=0 l=1
c=0xa7be300).fault
2014-04-01 11:59:27.639009 7ffec89c6700 -1 mds/CDir.cc: In function
'void CDir::_omap_fetched(ceph::bufferlist,
std::mapstd::basic_stringchar, std::char_traitschar, std::allocator\
char , ceph::buffer::list, std::lessstd::basic_stringchar,
std::char_traitschar, std::allocatorchar  ,
std::allocatorstd::pairconst std::basic_stringchar, std::char_trait\
schar, std::allocatorchar , ceph::buffer::list  , const
std::string, int)' thread 7ffec89c6700 time 2014-04-01 11:59:27.620684
mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)

 ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)
 1: (CDir::_omap_fetched(ceph::buffer::list, std::mapstd::string,
ceph::buffer::list, std::lessstd::string,
std::allocatorstd::pairstd::string const, ceph::buffer::list  , st\
d::string const, int)+0x4d71) [0x77c3c1]
 2: (Context::complete(int)+0x9) [0x56bb79]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11a6) [0x806dd6]
 4: (MDS::handle_core_message(Message*)+0x9c7) [0x5901d7]
 5: (MDS::_dispatch(Message*)+0x2f) [0x59028f]
 6: (MDS::ms_dispatch(Message*)+0x1ab) [0x591d4b]
 7: (DispatchQueue::entry()+0x582) [0x902072]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x85ef4d]
 9: /lib64/libpthread.so.0() [0x34c36079d1]
 10: