Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-08 Thread Tom Christensen
We didn't go forward to 4.2 as its a large production cluster, and we just
needed the problem fixed.  We'll probably test out 4.2 in the next couple
months, but this one slipped past us as it didn't occur in our test cluster
until after we had upgraded production.  In our experience it takes about 2
weeks to start happening, but once it does its all hands on deck cause
nodes are going to go down regularly.

All that being said, if/when we try 4.2 its going to need to run for 1-2
months rock solid in our test cluster before it gets to production.

On Tue, Dec 8, 2015 at 2:30 AM, Benedikt Fraunhofer 
wrote:

> Hi Tom,
>
> > We have been seeing this same behavior on a cluster that has been
> perfectly
> > happy until we upgraded to the ubuntu vivid 3.19 kernel.  We are in the
>
> i can't recall when we gave 3.19 a shot but now that you say it... The
> cluster was happy for >9 months with 3.16.
> Did you try 4.2 or do you think the regression from 3.16 introduced
> somewhere trough 3.19 is still in 4.2?
>
> Thx!
>Benedikt
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD error

2015-12-08 Thread Dan Nica
Hi guys,

Recently I installed ceph cluster version 9.2.0, and on my osd logs I see these 
errors:

2015-12-08 04:49:12.931683 7f42ec266700 -1 lsb_release_parse - pclose failed: 
(13) Permission denied
2015-12-08 04:49:12.955264 7f42ec266700 -1 lsb_release_parse - pclose failed: 
(13) Permission denied

Do I have to worry about it ? what is generating these errors ?

Thanks
Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-08 Thread Mykola Dvornik
The same thing happens to my setup with CentOS7.x + non-stock kernel 
(kernel-ml from elrepo).


I was not happy with IOPS I got out of the stock CentOS7.x so I did the 
kernel upgrade and crashes started to happen until some of the OSDs 
become non-bootable at all. The funny thing is that I was not able to 
downgrade back to stock since OSDs were crashing with 'cannot decode' 
errors. I am doing backup at the moment and OSDs crash from time to 
time due to the ceph watchdog despite the x20 timeouts.


I believe the version of kernel-ml I have started with was 3.19.


On Tue, Dec 8, 2015 at 10:34 AM, Tom Christensen  
wrote:
We didn't go forward to 4.2 as its a large production cluster, and we 
just needed the problem fixed.  We'll probably test out 4.2 in the 
next couple months, but this one slipped past us as it didn't occur 
in our test cluster until after we had upgraded production.  In our 
experience it takes about 2 weeks to start happening, but once it 
does its all hands on deck cause nodes are going to go down regularly.


All that being said, if/when we try 4.2 its going to need to run for 
1-2 months rock solid in our test cluster before it gets to 
production.


On Tue, Dec 8, 2015 at 2:30 AM, Benedikt Fraunhofer 
 wrote:

Hi Tom,

> We have been seeing this same behavior on a cluster that has been 
perfectly
> happy until we upgraded to the ubuntu vivid 3.19 kernel.  We are 
in the


i can't recall when we gave 3.19 a shot but now that you say it... 
The

cluster was happy for >9 months with 3.16.
Did you try 4.2 or do you think the regression from 3.16 introduced
somewhere trough 3.19 is still in 4.2?

Thx!
   Benedikt


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-08 Thread Tom Christensen
We aren't running NFS, but regularly use the kernel driver to map RBDs and
mount filesystems in same.  We see very similar behavior across nearly all
kernel versions we've tried.  In my experience only very few versions of
the kernel driver survive any sort of crush map change/update while
something is mapped.  In fact in the last 2 years I think I've only seen
this work on 1 kernel version unfortunately its badly out of date and we
can't run it in our environment anymore, I think it was a 3.0 kernel
version running on ubuntu 12.04.  We have just recently started trying to
find a kernel that will survive OSD outages or changes to the cluster.
We're on ubuntu 14.04, and have tried 3.16, 3.19.0-25, 4.3, and 4.2 without
success in the last week.  We only map 1-3 RBDs per client machine at a
time but we regularly will get processes stuck in D state which are
accessing the filesystem inside the RBD and will have to hard reboot the
RBD client machine.  This is always associated with a cluster change in
some way, reweighting OSDs, rebooting an OSD host, restarting an individual
OSD, adding OSDs, and removing OSDs all cause the kernel client to hang.
If no change is made to the cluster, the kernel client will be happy for
weeks.

On Mon, Dec 7, 2015 at 2:55 PM, Blair Bethwaite 
wrote:

> Hi Matt,
>
> (CC'ing in ceph-users too - similar reports there:
> http://www.spinics.net/lists/ceph-users/msg23037.html)
>
> We've seen something similar for KVM [lib]RBD clients acting as NFS
> gateways within our OpenStack cloud, the NFS services were locking up
> and causing client timeouts whenever we started doing Ceph
> maintenance. We eventually realised we'd somehow set the pool min_size
> == size, so any single OSD outage was blocking client IO - *oops*.
> Your issue sounds like something different, but NFS does seem to be
> very touchy and lacking any graceful recovery from issues with the
> underlying FS.
>
>
> On 8 December 2015 at 07:56, Matt Conner 
> wrote:
> > Hi,
> >
> > We have a Ceph cluster in which we have been having issues with RBD
> > clients hanging when an OSD failure occurs. We are using a NAS gateway
> > server which maps RBD images to filesystems and serves the filesystems
> > out via NFS. The gateway server has close to 180 NFS clients and
> > almost every time even 1 OSD goes down during heavy load, the NFS
> > exports lock up and the clients are unable to access the NAS share via
> > NFS. When the OSD fails, Ceph recovers without issue, but the gateway
> > kernel RBD module appears to get stuck waiting on the now failed OSD.
> > Note that this works correctly when under lighter loads.
> >
> > From what we have been able to determine, the NFS server daemon hangs
> > waiting for I/O from the OSD that went out and never recovers.
> > Similarly, attempting to access files from the exported FS locally on
> > the gateway server will result in a similar hang. We also noticed that
> > Ceph health details will continue to report blocked I/O on the now
> > down OSD until either the OSD is recovered or the gateway server is
> > rebooted.  Based on a few kernel logs from NFS and PVS, we were able
> > to trace the problem to the RBD kernel module.
> >
> > Unfortunately, the only way we have been able to recover our gateway
> > is by hard rebooting the server.
> >
> > Has anyone else encountered this issue and/or have a possible solution?
> > Are there suggestions for getting more detailed debugging information
> > from the RBD kernel module?
> >
> >
> > Few notes on our setup:
> > We are using Kernel RBD on a gateway server that exports filesystems via
> NFS
> > The exported filesystems are XFS on LVMs which are each composed of 16
> > striped images (NFS->LVM->XFS->PVS->RBD)
> > There are currently 176 mapped RBD images on the server (11
> > filesystems, 16 mapped RBD images per FS)
> > Gateway Kernel: 3.18.6
> > Ceph version: 0.80.9
> > Note - We've tried using different kernels all the way up to 4.3.0 but
> > the problem persists.
> >
> > Thanks,
> > Matt Conner
> > Keeper Technology
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Cheers,
> ~Blairo
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-08 Thread Benedikt Fraunhofer
Hi Tom,

> We have been seeing this same behavior on a cluster that has been perfectly
> happy until we upgraded to the ubuntu vivid 3.19 kernel.  We are in the

i can't recall when we gave 3.19 a shot but now that you say it... The
cluster was happy for >9 months with 3.16.
Did you try 4.2 or do you think the regression from 3.16 introduced
somewhere trough 3.19 is still in 4.2?

Thx!
   Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-08 Thread Jan Schermer

> On 08 Dec 2015, at 08:57, Benedikt Fraunhofer  wrote:
> 
> Hi Jan,
> 
>> Doesn't look near the limit currently (but I suppose you rebooted it in the 
>> meantime?).
> 
> the box this numbers came from has an uptime of 13 days
> so it's one of the boxes that did survive yesterdays half-cluster-wide-reboot.
> 

So this box had no issues? Keep an eye on the number of threadas, but maybe 
others will have a better idea, this is just where I'd start. I have seen close 
to a milion threads from OSDs on my boxes, not sure what the number are now.

>> Did iostat say anything about the drives? (btw dm-1 and dm-6 are what? Is 
>> that your data drives?) - were they overloaded really?
> 
> no they didn't have any load and or iops.
> Basically the whole box had nothing to do.
> 
> If I understand the load correctly, this just reports threads
> that are ready and willing to work but - in this case -
> don't get any data to work with.

Different unixes calculate this differently :-) By itself "load" is meaningless.
It should be something like an average number of processes that want to run at 
any given time but can't (because they are waiting for whatever they need - 
disks, CPU, blocking sockets...).

Jan


> 
> Thx
> 
> Benedikt
> 
> 
> 2015-12-08 8:44 GMT+01:00 Jan Schermer :
>> 
>> Jan
>> 
>> 
>>> On 08 Dec 2015, at 08:41, Benedikt Fraunhofer  wrote:
>>> 
>>> Hi Jan,
>>> 
>>> we had 65k for pid_max, which made
>>> kernel.threads-max = 1030520.
>>> or
>>> kernel.threads-max = 256832
>>> (looks like it depends on the number of cpus?)
>>> 
>>> currently we've
>>> 
>>> root@ceph1-store209:~# sysctl -a | grep -e thread -e pid
>>> kernel.cad_pid = 1
>>> kernel.core_uses_pid = 0
>>> kernel.ns_last_pid = 60298
>>> kernel.pid_max = 65535
>>> kernel.threads-max = 256832
>>> vm.nr_pdflush_threads = 0
>>> root@ceph1-store209:~# ps axH |wc -l
>>> 17548
>>> 
>>> we'll see how it behaves once puppet has come by and adjusted it.
>>> 
>>> Thx!
>>> 
>>> Benedikt
>> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-08 Thread Benedikt Fraunhofer
Hi Tom,

2015-12-08 10:34 GMT+01:00 Tom Christensen :

> We didn't go forward to 4.2 as its a large production cluster, and we just
> needed the problem fixed.  We'll probably test out 4.2 in the next couple

unfortunately we don't have the luxury of a test cluster.
and to add to that, we couldnt simulate the load, altough it does not
seem to be load related.
Did you try running with nodeep-scrub as a short-term workaround?

I'll give ~30% of the nodes 4.2 and see how it goes.

> In our experience it takes about 2 weeks to start happening

we're well below that. Somewhat between 1 and 4 days.
And yes, once one goes south, it affects the rest of the cluster.

Thx!

 Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-08 Thread Tom Christensen
We have been seeing this same behavior on a cluster that has been perfectly
happy until we upgraded to the ubuntu vivid 3.19 kernel.  We are in the
process of "upgrading" back to the 3.16 kernel across our cluster as we've
not seen this behavior on that kernel for over 6 months and we're pretty
strongly of the opinion this is a regression in the kernel.  Please let the
list know if upping your threads fixes your issue (though I'm not
optimistic) as we have our max threads set to the value recommended here
(4194303) but we still see this issue regularly on the 3.19 ubuntu kernel
we tried both 3.19.0-25 and 3.19.0-33 before giving up and reverting to
3.16.


On Tue, Dec 8, 2015 at 1:03 AM, Jan Schermer  wrote:

>
> > On 08 Dec 2015, at 08:57, Benedikt Fraunhofer 
> wrote:
> >
> > Hi Jan,
> >
> >> Doesn't look near the limit currently (but I suppose you rebooted it in
> the meantime?).
> >
> > the box this numbers came from has an uptime of 13 days
> > so it's one of the boxes that did survive yesterdays
> half-cluster-wide-reboot.
> >
>
> So this box had no issues? Keep an eye on the number of threadas, but
> maybe others will have a better idea, this is just where I'd start. I have
> seen close to a milion threads from OSDs on my boxes, not sure what the
> number are now.
>
> >> Did iostat say anything about the drives? (btw dm-1 and dm-6 are what?
> Is that your data drives?) - were they overloaded really?
> >
> > no they didn't have any load and or iops.
> > Basically the whole box had nothing to do.
> >
> > If I understand the load correctly, this just reports threads
> > that are ready and willing to work but - in this case -
> > don't get any data to work with.
>
> Different unixes calculate this differently :-) By itself "load" is
> meaningless.
> It should be something like an average number of processes that want to
> run at any given time but can't (because they are waiting for whatever they
> need - disks, CPU, blocking sockets...).
>
> Jan
>
>
> >
> > Thx
> >
> > Benedikt
> >
> >
> > 2015-12-08 8:44 GMT+01:00 Jan Schermer :
> >>
> >> Jan
> >>
> >>
> >>> On 08 Dec 2015, at 08:41, Benedikt Fraunhofer 
> wrote:
> >>>
> >>> Hi Jan,
> >>>
> >>> we had 65k for pid_max, which made
> >>> kernel.threads-max = 1030520.
> >>> or
> >>> kernel.threads-max = 256832
> >>> (looks like it depends on the number of cpus?)
> >>>
> >>> currently we've
> >>>
> >>> root@ceph1-store209:~# sysctl -a | grep -e thread -e pid
> >>> kernel.cad_pid = 1
> >>> kernel.core_uses_pid = 0
> >>> kernel.ns_last_pid = 60298
> >>> kernel.pid_max = 65535
> >>> kernel.threads-max = 256832
> >>> vm.nr_pdflush_threads = 0
> >>> root@ceph1-store209:~# ps axH |wc -l
> >>> 17548
> >>>
> >>> we'll see how it behaves once puppet has come by and adjusted it.
> >>>
> >>> Thx!
> >>>
> >>> Benedikt
> >>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] http://gitbuilder.ceph.com/

2015-12-08 Thread Xav Paice
Hi,

Just wondering if there's a known issue with http://gitbuilder.ceph.com/ -
if I go to several urls, e.g.
http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-trusty-x86_64-basic, I
get a 403.  That's still the right place to get deb's, right?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-08 Thread Ilya Dryomov
On Tue, Dec 8, 2015 at 11:53 AM, Tom Christensen  wrote:
> To be clear, we are also using format 2 RBDs, so we didn't really expect it
> to work until recently as it was listed as unsupported.  We are under the
> understanding that as of 3.19 RBD format 2 should be supported.  Are we
> incorrect in that understanding?

Format 2 images are supported starting with 3.10.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-08 Thread Tom Christensen
To be clear, we are also using format 2 RBDs, so we didn't really expect it
to work until recently as it was listed as unsupported.  We are under the
understanding that as of 3.19 RBD format 2 should be supported.  Are we
incorrect in that understanding?

On Tue, Dec 8, 2015 at 3:44 AM, Tom Christensen  wrote:

> We haven't submitted a ticket as we've just avoided using the kernel
> client.  We've periodically tried with various kernels and various versions
> of ceph over the last two years, but have just given up each time and
> reverted to using rbd-fuse, which although not super stable, at least
> doesn't hang the client box.  We find ourselves in the position now where
> for additional functionality we *need* an actual block device, so we have
> to find a kernel client that works.  I will certainly keep you posted and
> can produce the output you've requested.
>
> I'd also be willing to run an early 4.5 version in our test environment.
>
> On Tue, Dec 8, 2015 at 3:35 AM, Ilya Dryomov  wrote:
>
>> On Tue, Dec 8, 2015 at 10:57 AM, Tom Christensen 
>> wrote:
>> > We aren't running NFS, but regularly use the kernel driver to map RBDs
>> and
>> > mount filesystems in same.  We see very similar behavior across nearly
>> all
>> > kernel versions we've tried.  In my experience only very few versions
>> of the
>> > kernel driver survive any sort of crush map change/update while
>> something is
>> > mapped.  In fact in the last 2 years I think I've only seen this work
>> on 1
>> > kernel version unfortunately its badly out of date and we can't run it
>> in
>> > our environment anymore, I think it was a 3.0 kernel version running on
>> > ubuntu 12.04.  We have just recently started trying to find a kernel
>> that
>> > will survive OSD outages or changes to the cluster.  We're on ubuntu
>> 14.04,
>> > and have tried 3.16, 3.19.0-25, 4.3, and 4.2 without success in the last
>> > week.  We only map 1-3 RBDs per client machine at a time but we
>> regularly
>> > will get processes stuck in D state which are accessing the filesystem
>> > inside the RBD and will have to hard reboot the RBD client machine.
>> This is
>> > always associated with a cluster change in some way, reweighting OSDs,
>> > rebooting an OSD host, restarting an individual OSD, adding OSDs, and
>> > removing OSDs all cause the kernel client to hang.  If no change is
>> made to
>> > the cluster, the kernel client will be happy for weeks.
>>
>> There are a couple of known bugs in the remap/resubmit area, but those
>> are supposedly corner cases (like *all* the OSDs going down and then
>> back up, etc).  I had no idea it was that severe and goes that back.
>> Apparently triggering it requires a heavier load, as we've never seen
>> anything like that in our tests.
>>
>> For unrelated reasons, remap/resubmit code is getting entirely
>> rewritten for kernel 4.5, so, if you've been dealing with this issue
>> for the last two years (I don't remember seeing any tickets listing
>> that many kernel versions and not mentioning NFS), I'm afraid the best
>> course of action for you would be to wait for 4.5 to come out and try
>> it.  If you'd be willing to test out an early version on one of more of
>> your client boxes, I can ping you when it's ready.
>>
>> I'll take a look at 3.0 vs 3.16 with an eye on remap code.  Did you
>> happen to try 3.10?
>>
>> It sounds like you can reproduce this pretty easily.  Can you get it to
>> lock up and do:
>>
>> # cat /sys/kernel/debug/ceph/*/osdmap
>> # cat /sys/kernel/debug/ceph/*/osdc
>> $ ceph status
>>
>> and bunch of times?  I have a hunch that kernel client simply fails to
>> request enough of new osdmaps after the cluster topology changes under
>> load.
>>
>> Thanks,
>>
>> Ilya
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-08 Thread Tom Christensen
We run deep scrubs via cron with a script so we know when deep scrubs are
happening, and we've seen nodes fail both during deep scrubbing and while
no deep scrubs are occurring so I'm pretty sure its not related.


On Tue, Dec 8, 2015 at 2:42 AM, Benedikt Fraunhofer 
wrote:

> Hi Tom,
>
> 2015-12-08 10:34 GMT+01:00 Tom Christensen :
>
> > We didn't go forward to 4.2 as its a large production cluster, and we
> just
> > needed the problem fixed.  We'll probably test out 4.2 in the next couple
>
> unfortunately we don't have the luxury of a test cluster.
> and to add to that, we couldnt simulate the load, altough it does not
> seem to be load related.
> Did you try running with nodeep-scrub as a short-term workaround?
>
> I'll give ~30% of the nodes 4.2 and see how it goes.
>
> > In our experience it takes about 2 weeks to start happening
>
> we're well below that. Somewhat between 1 and 4 days.
> And yes, once one goes south, it affects the rest of the cluster.
>
> Thx!
>
>  Benedikt
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-08 Thread Tom Christensen
We haven't submitted a ticket as we've just avoided using the kernel
client.  We've periodically tried with various kernels and various versions
of ceph over the last two years, but have just given up each time and
reverted to using rbd-fuse, which although not super stable, at least
doesn't hang the client box.  We find ourselves in the position now where
for additional functionality we *need* an actual block device, so we have
to find a kernel client that works.  I will certainly keep you posted and
can produce the output you've requested.

I'd also be willing to run an early 4.5 version in our test environment.

On Tue, Dec 8, 2015 at 3:35 AM, Ilya Dryomov  wrote:

> On Tue, Dec 8, 2015 at 10:57 AM, Tom Christensen  wrote:
> > We aren't running NFS, but regularly use the kernel driver to map RBDs
> and
> > mount filesystems in same.  We see very similar behavior across nearly
> all
> > kernel versions we've tried.  In my experience only very few versions of
> the
> > kernel driver survive any sort of crush map change/update while
> something is
> > mapped.  In fact in the last 2 years I think I've only seen this work on
> 1
> > kernel version unfortunately its badly out of date and we can't run it in
> > our environment anymore, I think it was a 3.0 kernel version running on
> > ubuntu 12.04.  We have just recently started trying to find a kernel that
> > will survive OSD outages or changes to the cluster.  We're on ubuntu
> 14.04,
> > and have tried 3.16, 3.19.0-25, 4.3, and 4.2 without success in the last
> > week.  We only map 1-3 RBDs per client machine at a time but we regularly
> > will get processes stuck in D state which are accessing the filesystem
> > inside the RBD and will have to hard reboot the RBD client machine.
> This is
> > always associated with a cluster change in some way, reweighting OSDs,
> > rebooting an OSD host, restarting an individual OSD, adding OSDs, and
> > removing OSDs all cause the kernel client to hang.  If no change is made
> to
> > the cluster, the kernel client will be happy for weeks.
>
> There are a couple of known bugs in the remap/resubmit area, but those
> are supposedly corner cases (like *all* the OSDs going down and then
> back up, etc).  I had no idea it was that severe and goes that back.
> Apparently triggering it requires a heavier load, as we've never seen
> anything like that in our tests.
>
> For unrelated reasons, remap/resubmit code is getting entirely
> rewritten for kernel 4.5, so, if you've been dealing with this issue
> for the last two years (I don't remember seeing any tickets listing
> that many kernel versions and not mentioning NFS), I'm afraid the best
> course of action for you would be to wait for 4.5 to come out and try
> it.  If you'd be willing to test out an early version on one of more of
> your client boxes, I can ping you when it's ready.
>
> I'll take a look at 3.0 vs 3.16 with an eye on remap code.  Did you
> happen to try 3.10?
>
> It sounds like you can reproduce this pretty easily.  Can you get it to
> lock up and do:
>
> # cat /sys/kernel/debug/ceph/*/osdmap
> # cat /sys/kernel/debug/ceph/*/osdc
> $ ceph status
>
> and bunch of times?  I have a hunch that kernel client simply fails to
> request enough of new osdmaps after the cluster topology changes under
> load.
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph new installation of ceph 0.9.2 issue and crashing osds

2015-12-08 Thread Kenneth Waegeman

Hi,

I installed ceph 0.9.2 on a new cluster of 3 nodes, with 50 OSDs on each 
node (300GB disks, 96GB RAM)


While installing, I got some issue that I even could not login as ceph 
user. So I increased some limits:

 security/limits.conf

ceph-   nproc   1048576
ceph-   nofile 1048576

I could then install the other OSDs.

After the cluster was installed, I added some extra pools. when creating 
the pgs of these pools, the osds of the cluster started to fail, with 
stacktraces. If I try to restart them, they keep on failing. I don't 
know if this is an actual bug of Infernalis, or a limit that is still 
not high enough.. I've increased the noproc and nofile entries even 
more, but no luck. Someone has a clue? Hereby the stacktraces I see:


Mostly this one:

   -12> 2015-12-08 10:17:18.995243 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b(unlocked)] enter Initial
   -11> 2015-12-08 10:17:18.995279 7fa9063c5700  5 write_log with: 
dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, 
dirty_divergent_priors: false, divergent_priors: 0, writeout_from: 
4294967295'184467

44073709551615, trimmed:
   -10> 2015-12-08 10:17:18.995292 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive] exit Initial 0.48

0 0.00
-9> 2015-12-08 10:17:18.995301 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive] enter Reset
-8> 2015-12-08 10:17:18.995310 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] exit Reset 0.08

1 0.17
-7> 2015-12-08 10:17:18.995326 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] enter Started
-6> 2015-12-08 10:17:18.995332 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] enter Start
-5> 2015-12-08 10:17:18.995338 7fa9063c5700  1 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] state: transi

tioning to Primary
-4> 2015-12-08 10:17:18.995345 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] exit Start 0.12

0 0.00
-3> 2015-12-08 10:17:18.995352 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] enter Started/Primar

y
-2> 2015-12-08 10:17:18.995358 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 creating] enter Started/Primar

y/Peering
-1> 2015-12-08 10:17:18.995365 7fa9063c5700  5 osd.12 pg_epoch: 904 
pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904) 
[12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 creating+peering] enter Starte

d/Primary/Peering/GetInfo
 0> 2015-12-08 10:17:18.998472 7fa9063c5700 -1 common/Thread.cc: In 
function 'void Thread::create(size_t)' thread 7fa9063c5700 time 
2015-12-08 10:17:18.995438

common/Thread.cc: 154: FAILED assert(ret == 0)

 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x85) [0x7fa91924ebe5]

 2: (Thread::create(unsigned long)+0x8a) [0x7fa91923325a]
 3: (SimpleMessenger::connect_rank(entity_addr_t const&, int, 
PipeConnection*, Message*)+0x185) [0x7fa919229105]
 4: (SimpleMessenger::get_connection(entity_inst_t const&)+0x3ba) 
[0x7fa9192298ea]
 5: (OSDService::get_con_osd_cluster(int, unsigned int)+0x1ab) 
[0x7fa918c7318b]
 6: (OSD::do_queries(std::map >, 
std::less, std::allocator > > > > >&, std::shared_ptr)+0x1f1) 
[0x7fa918c9b061]
 7: (OSD::dispatch_context(PG::RecoveryCtx&, PG*, 
std::shared_ptr, ThreadPool::TPHandle*)+0x142) 
[0x7fa918cb5832]
 8: (OSD::handle_pg_create(std::shared_ptr)+0x133e) 
[0x7fa918cb820e]

 9: (OSD::dispatch_op(std::shared_ptr)+0x220) [0x7fa918cbc0c0]
 10: (OSD::do_waiters()+0x1c2) [0x7fa918cbc382]
 11: (OSD::ms_dispatch(Message*)+0x227) [0x7fa918cbd727]
 12: (DispatchQueue::entry()+0x649) [0x7fa91930a939]
 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fa91922eb1d]
 14: (()+0x7df5) [0x7fa9172e3df5]
 15: (clone()+0x6d) [0x7fa915b8c1ad]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


Also these:

--- begin dump 

Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-08 Thread Ilya Dryomov
On Tue, Dec 8, 2015 at 10:57 AM, Tom Christensen  wrote:
> We aren't running NFS, but regularly use the kernel driver to map RBDs and
> mount filesystems in same.  We see very similar behavior across nearly all
> kernel versions we've tried.  In my experience only very few versions of the
> kernel driver survive any sort of crush map change/update while something is
> mapped.  In fact in the last 2 years I think I've only seen this work on 1
> kernel version unfortunately its badly out of date and we can't run it in
> our environment anymore, I think it was a 3.0 kernel version running on
> ubuntu 12.04.  We have just recently started trying to find a kernel that
> will survive OSD outages or changes to the cluster.  We're on ubuntu 14.04,
> and have tried 3.16, 3.19.0-25, 4.3, and 4.2 without success in the last
> week.  We only map 1-3 RBDs per client machine at a time but we regularly
> will get processes stuck in D state which are accessing the filesystem
> inside the RBD and will have to hard reboot the RBD client machine.  This is
> always associated with a cluster change in some way, reweighting OSDs,
> rebooting an OSD host, restarting an individual OSD, adding OSDs, and
> removing OSDs all cause the kernel client to hang.  If no change is made to
> the cluster, the kernel client will be happy for weeks.

There are a couple of known bugs in the remap/resubmit area, but those
are supposedly corner cases (like *all* the OSDs going down and then
back up, etc).  I had no idea it was that severe and goes that back.
Apparently triggering it requires a heavier load, as we've never seen
anything like that in our tests.

For unrelated reasons, remap/resubmit code is getting entirely
rewritten for kernel 4.5, so, if you've been dealing with this issue
for the last two years (I don't remember seeing any tickets listing
that many kernel versions and not mentioning NFS), I'm afraid the best
course of action for you would be to wait for 4.5 to come out and try
it.  If you'd be willing to test out an early version on one of more of
your client boxes, I can ping you when it's ready.

I'll take a look at 3.0 vs 3.16 with an eye on remap code.  Did you
happen to try 3.10?

It sounds like you can reproduce this pretty easily.  Can you get it to
lock up and do:

# cat /sys/kernel/debug/ceph/*/osdmap
# cat /sys/kernel/debug/ceph/*/osdc
$ ceph status

and bunch of times?  I have a hunch that kernel client simply fails to
request enough of new osdmaps after the cluster topology changes under
load.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS Path restriction

2015-12-08 Thread Dennis Kramer (DT)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



Hi,

I'm trying to restrict clients to mount a specific path in CephFS.
I've been using the official doc for this:
http://docs.ceph.com/docs/master/cephfs/client-auth/

After setting these cap restrictions, the client can still mount and
use all directories in CephFS. Am I missing something?

I'm using the Hammer release version 0.94.5
(9764da52395923e0b32908d83a9f7304401fee43)
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.20 (GNU/Linux)

iEYEARECAAYFAlZm3ocACgkQiJDTKUBxIRu3sQCfWbi3EOQ3jSE8BPo3uRfDEVur
5FAAn1FED0a8wueNs4F3IwoO+Og3fV/m
=ooJh
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk list crashes in infernalis

2015-12-08 Thread Loic Dachary
Hi Felix,

Could you please ls -l /dev/cciss /sys/block/cciss*/ ?

Thanks for being the cciss proxy in fixing this problem :-)

Cheers

On 07/12/2015 11:43, Loic Dachary wrote:
> Thanks !
> 
> On 06/12/2015 17:50, Stolte, Felix wrote:
>> Hi Loic,
>>
>> output is:
>>
>> /dev:
>> insgesamt 0
>> crw--- 1 root root 10, 235 Dez  2 17:02 autofs
>> drwxr-xr-x 2 root root1000 Dez  2 17:02 block
>> drwxr-xr-x 2 root root  60 Dez  2 17:02 bsg
>> crw--- 1 root root 10, 234 Dez  5 06:29 btrfs-control
>> drwxr-xr-x 3 root root  60 Dez  2 17:02 bus
>> crw-r--r-- 1 root root255, 171 Dez  2 17:02 casr
>> drwxr-xr-x 2 root root 500 Dez  2 17:02 cciss
>> crw-r--r-- 1 root root255, 173 Dez  2 17:02 ccsm
>> lrwxrwxrwx 1 root root   3 Dez  2 17:02 cdrom -> sr0
>> crw-r--r-- 1 root root255, 178 Dez  2 17:02 cdt
>> crw-r--r-- 1 root root255, 172 Dez  2 17:02 cecc
>> crw-r--r-- 1 root root255, 176 Dez  2 17:02 cevt
>> drwxr-xr-x 2 root root3820 Dez  5 06:29 char
>> crw--- 1 root root  5,   1 Dez  2 17:04 console
>> lrwxrwxrwx 1 root root  11 Dez  2 17:02 core -> /proc/kcore
>> drw-r--r-- 2 root root 200 Dez  2 17:02 cpqhealth
>> drwxr-xr-x 2 root root  60 Dez  2 17:02 cpu
>> crw--- 1 root root 10,  60 Dez  2 17:02 cpu_dma_latency
>> crw-r--r-- 1 root root255, 180 Dez  2 17:02 crom
>> crw--- 1 root root 10, 203 Dez  2 17:02 cuse
>> drwxr-xr-x 8 root root 160 Dez  2 17:02 disk
>> drwxr-xr-x 2 root root 100 Dez  2 17:02 dri
>> crw--- 1 root root 10,  61 Dez  2 17:02 ecryptfs
>> crw-rw 1 root video29,   0 Dez  2 17:02 fb0
>> lrwxrwxrwx 1 root root  13 Dez  2 17:02 fd -> /proc/self/fd
>> crw-rw-rw- 1 root root  1,   7 Dez  2 17:02 full
>> crw-rw-rw- 1 root root 10, 229 Dez  2 17:02 fuse
>> crw--- 1 root root251,   0 Dez  2 17:02 hidraw0
>> crw--- 1 root root251,   1 Dez  2 17:02 hidraw1
>> crw--- 1 root root 10, 228 Dez  2 17:02 hpet
>> drwxr-xr-x 2 root root 360 Dez  2 17:02 hpilo
>> crw--- 1 root root 89,   0 Dez  2 17:02 i2c-0
>> crw--- 1 root root 89,   1 Dez  2 17:02 i2c-1
>> crw--- 1 root root 89,   2 Dez  2 17:02 i2c-2
>> crw--- 1 root root 89,   3 Dez  2 17:02 i2c-3
>> crw-r--r-- 1 root root255, 184 Dez  2 17:02 indc
>> drwxr-xr-x 4 root root 200 Dez  2 17:02 input
>> crw--- 1 root root248,   0 Dez  2 17:02 ipmi0
>> crw--- 1 root root249,   0 Dez  2 17:02 kfd
>> crw-r--r-- 1 root root  1,  11 Dez  2 17:02 kmsg
>> srw-rw-rw- 1 root root   0 Dez  2 17:02 log
>> brw-rw 1 root disk  7,   0 Dez  2 17:02 loop0
>> brw-rw 1 root disk  7,   1 Dez  2 17:02 loop1
>> brw-rw 1 root disk  7,   2 Dez  2 17:02 loop2
>> brw-rw 1 root disk  7,   3 Dez  2 17:02 loop3
>> brw-rw 1 root disk  7,   4 Dez  2 17:02 loop4
>> brw-rw 1 root disk  7,   5 Dez  2 17:02 loop5
>> brw-rw 1 root disk  7,   6 Dez  2 17:02 loop6
>> brw-rw 1 root disk  7,   7 Dez  2 17:02 loop7
>> crw--- 1 root root 10, 237 Dez  2 17:02 loop-control
>> drwxr-xr-x 2 root root  60 Dez  2 17:02 mapper
>> crw--- 1 root root 10, 227 Dez  2 17:02 mcelog
>> crw-r- 1 root kmem  1,   1 Dez  2 17:02 mem
>> crw--- 1 root root 10,  57 Dez  2 17:02 memory_bandwidth
>> crw--- 1 root root 10, 220 Dez  2 17:02 mptctl
>> drwxr-xr-x 2 root root  60 Dez  2 17:02 net
>> crw--- 1 root root 10,  59 Dez  2 17:02 network_latency
>> crw--- 1 root root 10,  58 Dez  2 17:02 network_throughput
>> crw-rw-rw- 1 root root  1,   3 Dez  2 17:02 null
>> crw-r- 1 root kmem  1,   4 Dez  2 17:02 port
>> crw--- 1 root root108,   0 Dez  2 17:02 ppp
>> crw-r--r-- 1 root root255, 183 Dez  2 17:02 proc
>> crw--- 1 root root 10,   1 Dez  2 17:02 psaux
>> crw-rw-rw- 1 root tty   5,   2 Dez  6 17:47 ptmx
>> drwxr-xr-x 2 root root   0 Dez  2 17:02 pts
>> brw-rw 1 root disk  1,   0 Dez  2 17:02 ram0
>> brw-rw 1 root disk  1,   1 Dez  2 17:02 ram1
>> brw-rw 1 root disk  1,  10 Dez  2 17:02 ram10
>> brw-rw 1 root disk  1,  11 Dez  2 17:02 ram11
>> brw-rw 1 root disk  1,  12 Dez  2 17:02 ram12
>> brw-rw 1 root disk  1,  13 Dez  2 17:02 ram13
>> brw-rw 1 root disk  1,  14 Dez  2 17:02 ram14
>> brw-rw 1 root disk  1,  15 Dez  2 17:02 ram15
>> brw-rw 1 root disk  1,   2 Dez  2 17:02 ram2
>> brw-rw 1 root disk  1,   3 Dez  2 17:02 ram3
>> brw-rw 1 root disk  1,   4 Dez  2 17:02 ram4
>> brw-rw 1 root disk  1,   5 Dez  2 17:02 ram5
>> brw-rw 1 root disk  1,   6 Dez  2 17:02 ram6
>> brw-rw 1 root disk  1,   7 Dez  2 17:02 ram7
>> brw-rw 1 root disk  1,   8 Dez  2 17:02 ram8
>> brw-rw 1 root disk  1,   9 Dez  2 17:02 ram9
>> crw-rw-rw- 1 root root  

Re: [ceph-users] Infernalis for Debian 8 armhf

2015-12-08 Thread Daleep Singh Bais
Hi,

I tried following the steps as you had mentioned and I am stuck while
building the package using dpkg-buildpackage -j4 with below mentioned
error message :

*Submodule path 'src/rocksdb': checked out
'dcdb0dd29232ece43f093c99220b0eea7ead51ff'**
**Unable to checkout 'b0d1137d31e4b36b72ccae9c0a9a13de2ec82faa' in
submodule path 'ceph-erasure-code-corpus'**
**Unable to checkout '67383cc060dd9f90d398eed5a00e31eb70845dd8' in
submodule path 'ceph-object-corpus'*

I have attached the log file also for your reference.

Please suggest.
Thanks.

Daleep Singh Bais

On 12/03/2015 01:36 AM, ceph new wrote:
> for now the prosses for me was :
> git clone https://github.com/ceph/ceph.git
> git checkout infernalis
> cd ceph
> apt-get install debhelper autoconf automake autotools-dev libbz2-dev
> cmake default-jdk gdisk javahelper junit4 libaio-dev libatomic-ops-dev
> libbabeltrace-ctf-dev libbabeltrace-dev libblkid-dev libboost-dev
> libboost-program-options-dev libboost-system-dev libboost-thread-dev
> libboost-regex-dev libboost-random-dev libcurl4-gnutls-dev libedit-dev
> libfcgi-dev libfuse-dev libkeyutils-dev libleveldb-dev libnss3-dev
> libsnappy-dev liblttng-ust-dev libtool libudev-dev libxml2-dev
> python-nose python-sphinx python-virtualenv uuid-runtime xfslibs-dev
> xfsprogs xmlstarlet libtcmalloc-minimal4 libgoogle-perftools-dev
> libgoogle-perftools4
> ./install-deps.sh
> dpkg-buildpackage -j3
>
> i get kill-OOM so i will add swap space and run it
>
> On Wed, Dec 2, 2015 at 12:30 PM, Swapnil Jain  > wrote:
>
> If you can point me to some documention, I can do that. 
>
> —
> *Swapnil Jain*
>
>> On 02-Dec-2015, at 7:31 pm, Alfredo Deza  wrote:
>>
>> On Tue, Dec 1, 2015 at 11:58 PM, Swapnil Jain > > wrote:
>>>
>>> Hi,
>>>
>>> Any plans to release Infernalis Debian 8 binary packages for
>>> armhf. As I only see it for amd64.
>>
>> This would be pretty simple to do but we don't have any ARM boxes
>> around and nothing is immediately available for us
>> to setup any.
>>
>>>
>>>
>>>
>>> —
>>>
>>> Swapnil Jain
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 dpkg-source -b ceph
dpkg-source: info: using source format `1.0'
dpkg-source: warning: source directory 'ceph' is not 
- 'ceph-9.2.0'
dpkg-source: info: building ceph in ceph_9.2.0-1.tar.gz
dpkg-source: info: building ceph in ceph_9.2.0-1.dsc
 debian/rules build
dh_testdir
./autogen.sh
+ set -e
+ test -f src/ceph.in
+ which libtoolize
+ [ /usr/bin/libtoolize ]
+ LIBTOOLIZE=libtoolize
+ test -d .git
+ + gitgrep submodule --quiet usage
 update.*--force
+ echo --force
+ force=--force
+ git submodule sync
Synchronizing submodule url for 'ceph-erasure-code-corpus'
Synchronizing submodule url for 'ceph-object-corpus'
Synchronizing submodule url for 'src/civetweb'
Synchronizing submodule url for 'src/erasure-code/jerasure/gf-complete'
Synchronizing submodule url for 'src/erasure-code/jerasure/jerasure'
Synchronizing submodule url for 'src/gmock'
Synchronizing submodule url for 'src/rocksdb'
+ git submodule update --force --init --recursive
fatal: Unable to create 
'/root/ceph/.git/modules/ceph-erasure-code-corpus/index.lock': File exists.

If no other git process is currently running, this probably means a
git process crashed in this repository earlier. Make sure no other git
process is running and remove the file manually to continue.
fatal: Unable to create '/root/ceph/ceph-object-corpus/.git/index.lock': File 
exists.

If no other git process is currently running, this probably means a
git process crashed in this repository earlier. Make sure no other git
process is running and remove the file manually to continue.
Submodule path 'src/civetweb': checked out 
'8d271315a541218caada366f84a2690fdbd474a2'
Submodule path 'src/erasure-code/jerasure/gf-complete': checked out 
'9caeefbf2860e56a75502f4d3342deed5b5ba265'
Submodule path 'src/erasure-code/jerasure/jerasure': checked out 
'02731df4c1eae1819c4453c9d3ab6d408cadd085'
Submodule path 'src/gmock': checked out 
'49beb3bdf05a728afb48dbfbeb1a693ce4c38027'
Submodule path 'src/gmock/gtest': checked out 
'258068668c61e6721007fe8bfd8a338ed8e6cc50'
Submodule path 'src/rocksdb': checked out 
'dcdb0dd29232ece43f093c99220b0eea7ead51ff'
Unable to checkout 'b0d1137d31e4b36b72ccae9c0a9a13de2ec82faa' in submodule 

Re: [ceph-users] ceph-disk list crashes in infernalis

2015-12-08 Thread Stolte, Felix
Yes, they do contain a "!"

Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt


-Ursprüngliche Nachricht-
Von: Loic Dachary [mailto:l...@dachary.org] 
Gesendet: Dienstag, 8. Dezember 2015 15:17
An: Stolte, Felix; ceph-us...@ceph.com
Betreff: Re: [ceph-users] ceph-disk list crashes in infernalis

I also need to confirm that the names that show in /sys/block/*/holders are
with a ! (it would not make sense to me if they were not but ...)

On 08/12/2015 15:05, Loic Dachary wrote:
> Hi Felix,
> 
> Could you please ls -l /dev/cciss /sys/block/cciss*/ ?
> 
> Thanks for being the cciss proxy in fixing this problem :-)
> 
> Cheers
> 
> On 07/12/2015 11:43, Loic Dachary wrote:
>> Thanks !
>>
>> On 06/12/2015 17:50, Stolte, Felix wrote:
>>> Hi Loic,
>>>
>>> output is:
>>>
>>> /dev:
>>> insgesamt 0
>>> crw--- 1 root root 10, 235 Dez  2 17:02 autofs
>>> drwxr-xr-x 2 root root1000 Dez  2 17:02 block
>>> drwxr-xr-x 2 root root  60 Dez  2 17:02 bsg
>>> crw--- 1 root root 10, 234 Dez  5 06:29 btrfs-control
>>> drwxr-xr-x 3 root root  60 Dez  2 17:02 bus
>>> crw-r--r-- 1 root root255, 171 Dez  2 17:02 casr
>>> drwxr-xr-x 2 root root 500 Dez  2 17:02 cciss
>>> crw-r--r-- 1 root root255, 173 Dez  2 17:02 ccsm
>>> lrwxrwxrwx 1 root root   3 Dez  2 17:02 cdrom -> sr0
>>> crw-r--r-- 1 root root255, 178 Dez  2 17:02 cdt
>>> crw-r--r-- 1 root root255, 172 Dez  2 17:02 cecc
>>> crw-r--r-- 1 root root255, 176 Dez  2 17:02 cevt
>>> drwxr-xr-x 2 root root3820 Dez  5 06:29 char
>>> crw--- 1 root root  5,   1 Dez  2 17:04 console
>>> lrwxrwxrwx 1 root root  11 Dez  2 17:02 core -> /proc/kcore
>>> drw-r--r-- 2 root root 200 Dez  2 17:02 cpqhealth
>>> drwxr-xr-x 2 root root  60 Dez  2 17:02 cpu
>>> crw--- 1 root root 10,  60 Dez  2 17:02 cpu_dma_latency
>>> crw-r--r-- 1 root root255, 180 Dez  2 17:02 crom
>>> crw--- 1 root root 10, 203 Dez  2 17:02 cuse
>>> drwxr-xr-x 8 root root 160 Dez  2 17:02 disk
>>> drwxr-xr-x 2 root root 100 Dez  2 17:02 dri
>>> crw--- 1 root root 10,  61 Dez  2 17:02 ecryptfs
>>> crw-rw 1 root video29,   0 Dez  2 17:02 fb0
>>> lrwxrwxrwx 1 root root  13 Dez  2 17:02 fd -> /proc/self/fd
>>> crw-rw-rw- 1 root root  1,   7 Dez  2 17:02 full
>>> crw-rw-rw- 1 root root 10, 229 Dez  2 17:02 fuse
>>> crw--- 1 root root251,   0 Dez  2 17:02 hidraw0
>>> crw--- 1 root root251,   1 Dez  2 17:02 hidraw1
>>> crw--- 1 root root 10, 228 Dez  2 17:02 hpet
>>> drwxr-xr-x 2 root root 360 Dez  2 17:02 hpilo
>>> crw--- 1 root root 89,   0 Dez  2 17:02 i2c-0
>>> crw--- 1 root root 89,   1 Dez  2 17:02 i2c-1
>>> crw--- 1 root root 89,   2 Dez  2 17:02 i2c-2
>>> crw--- 1 root root 89,   3 Dez  2 17:02 i2c-3
>>> crw-r--r-- 1 root root255, 184 Dez  2 17:02 indc
>>> drwxr-xr-x 4 root root 200 Dez  2 17:02 input
>>> crw--- 1 root root248,   0 Dez  2 17:02 ipmi0
>>> crw--- 1 root root249,   0 Dez  2 17:02 kfd
>>> crw-r--r-- 1 root root  1,  11 Dez  2 17:02 kmsg
>>> srw-rw-rw- 1 root root   0 Dez  2 17:02 log
>>> brw-rw 1 root disk  7,   0 Dez  2 17:02 loop0
>>> brw-rw 1 root disk  7,   1 Dez  2 17:02 loop1
>>> brw-rw 1 root disk  7,   2 Dez  2 17:02 loop2
>>> brw-rw 1 root disk  7,   3 Dez  2 17:02 loop3
>>> brw-rw 1 root disk  7,   4 Dez  2 17:02 loop4
>>> brw-rw 1 root disk  7,   5 Dez  2 17:02 loop5
>>> brw-rw 1 root disk  7,   6 Dez  2 17:02 loop6
>>> brw-rw 1 root disk  7,   7 Dez  2 17:02 loop7
>>> crw--- 1 root root 10, 237 Dez  2 17:02 loop-control
>>> drwxr-xr-x 2 root root  60 Dez  2 17:02 mapper
>>> crw--- 1 root root 10, 227 Dez  2 17:02 mcelog
>>> crw-r- 1 root kmem  1,   1 Dez  2 17:02 mem
>>> crw--- 1 root root 10,  57 Dez  2 17:02 memory_bandwidth
>>> crw--- 1 root root 10, 220 Dez  2 17:02 mptctl
>>> drwxr-xr-x 2 root root  60 Dez  2 17:02 net
>>> crw--- 1 root root 10,  59 Dez  2 17:02 network_latency
>>> crw--- 1 root root 10,  58 Dez  2 17:02 network_throughput
>>> crw-rw-rw- 1 root root  1,   3 Dez  2 17:02 null
>>> crw-r- 1 root kmem  1,   4 Dez  2 17:02 port
>>> crw--- 1 root root108,   0 Dez  2 17:02 ppp
>>> crw-r--r-- 1 root root255, 183 Dez  2 17:02 proc
>>> crw--- 1 root root 10,   1 Dez  2 17:02 psaux
>>> crw-rw-rw- 1 root tty   5,   2 Dez  6 17:47 ptmx
>>> drwxr-xr-x 2 root root   0 Dez  2 17:02 pts
>>> brw-rw 1 root 

Re: [ceph-users] CephFS Path restriction

2015-12-08 Thread Dennis Kramer (DT)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ah, that explains alot. Thank you.
Yes, it was a bit confusing for which version it applied to.

Awesome addition by the way, I like the path parameter!

Cheers.

On 12/08/2015 03:15 PM, John Spray wrote:
> On Tue, Dec 8, 2015 at 1:43 PM, Dennis Kramer (DT)
>  wrote:
> 
> 
> Hi,
> 
> I'm trying to restrict clients to mount a specific path in CephFS. 
> I've been using the official doc for this: 
> http://docs.ceph.com/docs/master/cephfs/client-auth/
> 
> After setting these cap restrictions, the client can still mount
> and use all directories in CephFS. Am I missing something?
> 
>> You're looking at the master docs -- this functionality is newer
>> than Hammer.  It'll be in the Jewel release.
> 
>> I should have noted that on the page, because people do tend to
>> end up finding master docs no matter what version they're using.
> 
>> John
> 
> 
> I'm using the Hammer release version 0.94.5 
> (9764da52395923e0b32908d83a9f7304401fee43)
>> ___ ceph-users
>> mailing list ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.20 (GNU/Linux)

iEYEARECAAYFAlZm5xMACgkQiJDTKUBxIRtx7wCg0dpcI2yMuzXASgYzlA1xLD1k
C7cAniY0JciDqU3Z1t5A1tqtdXk3vFxM
=OSaG
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Path restriction

2015-12-08 Thread John Spray
On Tue, Dec 8, 2015 at 1:43 PM, Dennis Kramer (DT)  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
>
>
> Hi,
>
> I'm trying to restrict clients to mount a specific path in CephFS.
> I've been using the official doc for this:
> http://docs.ceph.com/docs/master/cephfs/client-auth/
>
> After setting these cap restrictions, the client can still mount and
> use all directories in CephFS. Am I missing something?

You're looking at the master docs -- this functionality is newer than
Hammer.  It'll be in the Jewel release.

I should have noted that on the page, because people do tend to end up
finding master docs no matter what version they're using.

John

>
> I'm using the Hammer release version 0.94.5
> (9764da52395923e0b32908d83a9f7304401fee43)
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.20 (GNU/Linux)
>
> iEYEARECAAYFAlZm3ocACgkQiJDTKUBxIRu3sQCfWbi3EOQ3jSE8BPo3uRfDEVur
> 5FAAn1FED0a8wueNs4F3IwoO+Og3fV/m
> =ooJh
> -END PGP SIGNATURE-
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk list crashes in infernalis

2015-12-08 Thread Stolte, Felix
Hi Loic,

glad to help. Thanks for fixing this problem :=

Output is:

/dev/cciss:
insgesamt 0
brw-rw 1 root disk 104,   0 Dez  2 17:02 c0d0
brw-rw 1 root disk 104,   1 Dez  2 17:02 c0d0p1
brw-rw 1 root disk 104,   2 Dez  2 17:02 c0d0p2
brw-rw 1 root disk 104,   5 Dez  2 17:02 c0d0p5
brw-rw 1 root disk 104,  16 Dez  2 17:02 c0d1
brw-rw 1 ceph ceph 104,  17 Dez  2 17:02 c0d1p1
brw-rw 1 root disk 104,  32 Dez  2 17:02 c0d2
brw-rw 1 ceph ceph 104,  33 Dez  2 17:02 c0d2p1
brw-rw 1 root disk 104,  48 Dez  2 17:02 c0d3
brw-rw 1 ceph ceph 104,  49 Dez  2 17:02 c0d3p1
brw-rw 1 root disk 104,  64 Dez  2 17:02 c0d4
brw-rw 1 ceph ceph 104,  65 Dez  2 17:02 c0d4p1
brw-rw 1 root disk 104,  80 Dez  2 17:02 c0d5
brw-rw 1 ceph ceph 104,  81 Dez  2 17:02 c0d5p1
brw-rw 1 root disk 104,  96 Dez  2 17:02 c0d6
brw-rw 1 ceph ceph 104,  97 Dez  2 17:02 c0d6p1
brw-rw 1 root disk 104, 112 Dez  2 17:02 c0d7
brw-rw 1 ceph ceph 104, 114 Dez  8 15:13 c0d7p2
brw-rw 1 ceph ceph 104, 115 Dez  8 15:13 c0d7p3
brw-rw 1 ceph ceph 104, 116 Dez  8 15:13 c0d7p4
brw-rw 1 ceph ceph 104, 117 Dez  8 15:13 c0d7p5
brw-rw 1 ceph ceph 104, 118 Dez  8 15:13 c0d7p6
brw-rw 1 ceph ceph 104, 119 Dez  8 15:13 c0d7p7

/sys/block/cciss!c0d0/:
insgesamt 0
-r--r--r-- 1 root root 4096 Dez  8 15:07 alignment_offset
lrwxrwxrwx 1 root root0 Dez  8 15:07 bdi ->
../../../../../../../virtual/bdi/104:0
-r--r--r-- 1 root root 4096 Dez  8 15:07 capability
drwxr-xr-x 5 root root0 Dez  8 15:07 cciss!c0d0p1
drwxr-xr-x 5 root root0 Dez  8 15:07 cciss!c0d0p2
drwxr-xr-x 5 root root0 Dez  8 15:07 cciss!c0d0p5
-r--r--r-- 1 root root 4096 Dez  8 15:07 dev
lrwxrwxrwx 1 root root0 Dez  8 15:07 device -> ../../../c0d0
-r--r--r-- 1 root root 4096 Dez  8 15:07 discard_alignment
-r--r--r-- 1 root root 4096 Dez  8 15:07 ext_range
drwxr-xr-x 2 root root0 Dez  8 15:07 holders
-r--r--r-- 1 root root 4096 Dez  8 15:07 inflight
drwxr-xr-x 2 root root0 Dez  8 15:07 power
drwxr-xr-x 3 root root0 Dez  8 15:07 queue
-r--r--r-- 1 root root 4096 Dez  8 15:07 range
-r--r--r-- 1 root root 4096 Dez  8 15:07 removable
-r--r--r-- 1 root root 4096 Dez  8 15:07 ro
-r--r--r-- 1 root root 4096 Dez  8 15:07 size
drwxr-xr-x 2 root root0 Dez  8 15:07 slaves
-r--r--r-- 1 root root 4096 Dez  8 15:07 stat
lrwxrwxrwx 1 root root0 Dez  8 15:07 subsystem ->
../../../../../../../../class/block
drwxr-xr-x 2 root root0 Dez  8 15:07 trace
-rw-r--r-- 1 root root 4096 Dez  8 15:07 uevent

/sys/block/cciss!c0d1/:
insgesamt 0
-r--r--r-- 1 root root 4096 Dez  8 15:07 alignment_offset
lrwxrwxrwx 1 root root0 Dez  8 15:07 bdi ->
../../../../../../../virtual/bdi/104:16
-r--r--r-- 1 root root 4096 Dez  8 15:07 capability
drwxr-xr-x 5 root root0 Dez  8 15:07 cciss!c0d1p1
-r--r--r-- 1 root root 4096 Dez  8 15:07 dev
lrwxrwxrwx 1 root root0 Dez  8 15:07 device -> ../../../c0d1
-r--r--r-- 1 root root 4096 Dez  8 15:07 discard_alignment
-r--r--r-- 1 root root 4096 Dez  8 15:07 ext_range
drwxr-xr-x 2 root root0 Dez  8 15:07 holders
-r--r--r-- 1 root root 4096 Dez  8 15:07 inflight
drwxr-xr-x 2 root root0 Dez  8 15:07 power
drwxr-xr-x 3 root root0 Dez  8 15:07 queue
-r--r--r-- 1 root root 4096 Dez  8 15:07 range
-r--r--r-- 1 root root 4096 Dez  8 15:07 removable
-r--r--r-- 1 root root 4096 Dez  8 15:07 ro
-r--r--r-- 1 root root 4096 Dez  8 15:07 size
drwxr-xr-x 2 root root0 Dez  8 15:07 slaves
-r--r--r-- 1 root root 4096 Dez  8 15:07 stat
lrwxrwxrwx 1 root root0 Dez  8 15:07 subsystem ->
../../../../../../../../class/block
drwxr-xr-x 2 root root0 Dez  8 15:07 trace
-rw-r--r-- 1 root root 4096 Dez  8 15:07 uevent

/sys/block/cciss!c0d2/:
insgesamt 0
-r--r--r-- 1 root root 4096 Dez  8 15:07 alignment_offset
lrwxrwxrwx 1 root root0 Dez  8 15:07 bdi ->
../../../../../../../virtual/bdi/104:32
-r--r--r-- 1 root root 4096 Dez  8 15:07 capability
drwxr-xr-x 5 root root0 Dez  8 15:07 cciss!c0d2p1
-r--r--r-- 1 root root 4096 Dez  8 15:07 dev
lrwxrwxrwx 1 root root0 Dez  8 15:07 device -> ../../../c0d2
-r--r--r-- 1 root root 4096 Dez  8 15:07 discard_alignment
-r--r--r-- 1 root root 4096 Dez  8 15:07 ext_range
drwxr-xr-x 2 root root0 Dez  8 15:07 holders
-r--r--r-- 1 root root 4096 Dez  8 15:07 inflight
drwxr-xr-x 2 root root0 Dez  8 15:07 power
drwxr-xr-x 3 root root0 Dez  8 15:07 queue
-r--r--r-- 1 root root 4096 Dez  8 15:07 range
-r--r--r-- 1 root root 4096 Dez  8 15:07 removable
-r--r--r-- 1 root root 4096 Dez  8 15:07 ro
-r--r--r-- 1 root root 4096 Dez  8 15:07 size
drwxr-xr-x 2 root root0 Dez  8 15:07 slaves
-r--r--r-- 1 root root 4096 Dez  8 15:07 stat
lrwxrwxrwx 1 root root0 Dez  8 15:07 subsystem ->
../../../../../../../../class/block
drwxr-xr-x 2 root root0 Dez  8 15:07 trace
-rw-r--r-- 1 root root 4096 Dez  8 15:07 uevent

/sys/block/cciss!c0d3/:
insgesamt 0
-r--r--r-- 1 root root 4096 Dez  8 15:07 alignment_offset
lrwxrwxrwx 1 

Re: [ceph-users] ceph-disk list crashes in infernalis

2015-12-08 Thread Loic Dachary
I also need to confirm that the names that show in /sys/block/*/holders are 
with a ! (it would not make sense to me if they were not but ...)

On 08/12/2015 15:05, Loic Dachary wrote:
> Hi Felix,
> 
> Could you please ls -l /dev/cciss /sys/block/cciss*/ ?
> 
> Thanks for being the cciss proxy in fixing this problem :-)
> 
> Cheers
> 
> On 07/12/2015 11:43, Loic Dachary wrote:
>> Thanks !
>>
>> On 06/12/2015 17:50, Stolte, Felix wrote:
>>> Hi Loic,
>>>
>>> output is:
>>>
>>> /dev:
>>> insgesamt 0
>>> crw--- 1 root root 10, 235 Dez  2 17:02 autofs
>>> drwxr-xr-x 2 root root1000 Dez  2 17:02 block
>>> drwxr-xr-x 2 root root  60 Dez  2 17:02 bsg
>>> crw--- 1 root root 10, 234 Dez  5 06:29 btrfs-control
>>> drwxr-xr-x 3 root root  60 Dez  2 17:02 bus
>>> crw-r--r-- 1 root root255, 171 Dez  2 17:02 casr
>>> drwxr-xr-x 2 root root 500 Dez  2 17:02 cciss
>>> crw-r--r-- 1 root root255, 173 Dez  2 17:02 ccsm
>>> lrwxrwxrwx 1 root root   3 Dez  2 17:02 cdrom -> sr0
>>> crw-r--r-- 1 root root255, 178 Dez  2 17:02 cdt
>>> crw-r--r-- 1 root root255, 172 Dez  2 17:02 cecc
>>> crw-r--r-- 1 root root255, 176 Dez  2 17:02 cevt
>>> drwxr-xr-x 2 root root3820 Dez  5 06:29 char
>>> crw--- 1 root root  5,   1 Dez  2 17:04 console
>>> lrwxrwxrwx 1 root root  11 Dez  2 17:02 core -> /proc/kcore
>>> drw-r--r-- 2 root root 200 Dez  2 17:02 cpqhealth
>>> drwxr-xr-x 2 root root  60 Dez  2 17:02 cpu
>>> crw--- 1 root root 10,  60 Dez  2 17:02 cpu_dma_latency
>>> crw-r--r-- 1 root root255, 180 Dez  2 17:02 crom
>>> crw--- 1 root root 10, 203 Dez  2 17:02 cuse
>>> drwxr-xr-x 8 root root 160 Dez  2 17:02 disk
>>> drwxr-xr-x 2 root root 100 Dez  2 17:02 dri
>>> crw--- 1 root root 10,  61 Dez  2 17:02 ecryptfs
>>> crw-rw 1 root video29,   0 Dez  2 17:02 fb0
>>> lrwxrwxrwx 1 root root  13 Dez  2 17:02 fd -> /proc/self/fd
>>> crw-rw-rw- 1 root root  1,   7 Dez  2 17:02 full
>>> crw-rw-rw- 1 root root 10, 229 Dez  2 17:02 fuse
>>> crw--- 1 root root251,   0 Dez  2 17:02 hidraw0
>>> crw--- 1 root root251,   1 Dez  2 17:02 hidraw1
>>> crw--- 1 root root 10, 228 Dez  2 17:02 hpet
>>> drwxr-xr-x 2 root root 360 Dez  2 17:02 hpilo
>>> crw--- 1 root root 89,   0 Dez  2 17:02 i2c-0
>>> crw--- 1 root root 89,   1 Dez  2 17:02 i2c-1
>>> crw--- 1 root root 89,   2 Dez  2 17:02 i2c-2
>>> crw--- 1 root root 89,   3 Dez  2 17:02 i2c-3
>>> crw-r--r-- 1 root root255, 184 Dez  2 17:02 indc
>>> drwxr-xr-x 4 root root 200 Dez  2 17:02 input
>>> crw--- 1 root root248,   0 Dez  2 17:02 ipmi0
>>> crw--- 1 root root249,   0 Dez  2 17:02 kfd
>>> crw-r--r-- 1 root root  1,  11 Dez  2 17:02 kmsg
>>> srw-rw-rw- 1 root root   0 Dez  2 17:02 log
>>> brw-rw 1 root disk  7,   0 Dez  2 17:02 loop0
>>> brw-rw 1 root disk  7,   1 Dez  2 17:02 loop1
>>> brw-rw 1 root disk  7,   2 Dez  2 17:02 loop2
>>> brw-rw 1 root disk  7,   3 Dez  2 17:02 loop3
>>> brw-rw 1 root disk  7,   4 Dez  2 17:02 loop4
>>> brw-rw 1 root disk  7,   5 Dez  2 17:02 loop5
>>> brw-rw 1 root disk  7,   6 Dez  2 17:02 loop6
>>> brw-rw 1 root disk  7,   7 Dez  2 17:02 loop7
>>> crw--- 1 root root 10, 237 Dez  2 17:02 loop-control
>>> drwxr-xr-x 2 root root  60 Dez  2 17:02 mapper
>>> crw--- 1 root root 10, 227 Dez  2 17:02 mcelog
>>> crw-r- 1 root kmem  1,   1 Dez  2 17:02 mem
>>> crw--- 1 root root 10,  57 Dez  2 17:02 memory_bandwidth
>>> crw--- 1 root root 10, 220 Dez  2 17:02 mptctl
>>> drwxr-xr-x 2 root root  60 Dez  2 17:02 net
>>> crw--- 1 root root 10,  59 Dez  2 17:02 network_latency
>>> crw--- 1 root root 10,  58 Dez  2 17:02 network_throughput
>>> crw-rw-rw- 1 root root  1,   3 Dez  2 17:02 null
>>> crw-r- 1 root kmem  1,   4 Dez  2 17:02 port
>>> crw--- 1 root root108,   0 Dez  2 17:02 ppp
>>> crw-r--r-- 1 root root255, 183 Dez  2 17:02 proc
>>> crw--- 1 root root 10,   1 Dez  2 17:02 psaux
>>> crw-rw-rw- 1 root tty   5,   2 Dez  6 17:47 ptmx
>>> drwxr-xr-x 2 root root   0 Dez  2 17:02 pts
>>> brw-rw 1 root disk  1,   0 Dez  2 17:02 ram0
>>> brw-rw 1 root disk  1,   1 Dez  2 17:02 ram1
>>> brw-rw 1 root disk  1,  10 Dez  2 17:02 ram10
>>> brw-rw 1 root disk  1,  11 Dez  2 17:02 ram11
>>> brw-rw 1 root disk  1,  12 Dez  2 17:02 ram12
>>> brw-rw 1 root disk  1,  13 Dez  2 17:02 ram13
>>> brw-rw 1 root disk  1,  14 Dez  2 17:02 ram14
>>> brw-rw 1 root disk  1,  15 Dez  2 17:02 ram15
>>> brw-rw 1 root disk  1,   2 Dez  2 17:02 ram2
>>> brw-rw 1 root disk  1,   3 Dez  2 17:02 ram3
>>> brw-rw 1 root disk  1,   4 Dez  2 17:02 ram4
>>> brw-rw 1 root 

[ceph-users] Fwd: scrub error with ceph

2015-12-08 Thread Erming Pei
(Found no response from the current list, so forwarded to 
ceph-us...@ceph.com. )


Sorry if it's duplicated.


 Original Message 
Subject:scrub error with ceph
Date:   Mon, 7 Dec 2015 14:15:07 -0700
From:   Erming Pei 
To: ceph-users@lists.ceph.com



Hi,

   I found there are 128 scrub errors in my ceph system.  Checked with 
health detail and found many pgs with stuck unclean issue. Should I 
repair all of them? Or what I should do?


[root@gcloudnet ~]# ceph -s

cluster a4d0879f-abdc-4f9d-8a4b-53ce57d822f1

 health HEALTH_ERR 128 pgs inconsistent; 128 scrub errors; mds1: 
Client HTRC:cephfs_data failing to respond to cache pressure; mds0: 
Client physics-007:cephfs_data failing to respond to cache pressure; 
pool 'cephfs_data' is full


 monmap e3: 3 mons at 
{gcloudnet=xxx.xxx.xxx.xxx:6789/0,gcloudsrv1=xxx.xxx.xxx.xxx:6789/0,gcloudsrv2=xxx.xxx.xxx.xxx:6789/0}, 
election epoch 178, quorum 0,1,2 gcloudnet,gcloudsrv1,gcloudsrv2


 mdsmap e51000: 2/2/2 up {0=gcloudsrv1=up:active,1=gcloudnet=up:active}

 osdmap e2821: 18 osds: 18 up, 18 in

  pgmap v10457877: 3648 pgs, 23 pools, 10501 GB data, 38688 kobjects

14097 GB used, 117 TB / 130 TB avail

   6 active+clean+scrubbing+deep

3513 active+clean

 128 active+clean+inconsistent

   1 active+clean+scrubbing


P.S. I am increasing the pg and pgp numbers for cephfs_data pool.


Thanks,

Erming



--


Erming Pei, Ph.D, Senior System Analyst
HPC Grid/Cloud Specialist, ComputeCanada/WestGrid

Research Computing Group, IST
University of Alberta, Canada T6G 2H1
Email:erm...@ualberta.ca  erming@cern.ch  

Tel. :+1 7804929914 Fax:+1 7804921729



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph snapshost

2015-12-08 Thread Dan Nica
Hi guys,

So from documentation I must stop the I/O before taking rbd snapshots, how do I 
do that or what does that mean ? do I have to unmount
the rbd image ?

--
Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] http://gitbuilder.ceph.com/

2015-12-08 Thread Ken Dreyer
Yes, we've had to move all of our hardware out of the datacenter in
Irvine, California to a new home in Raleigh, North Carolina. The
backend server for gitbuilder.ceph.com had a *lot* of data and we were
not able to sync all of it to an interim server in Raleigh before we
had to unplug the old one.

Since you brought up fastcgi, it's a good idea to transition your
cluster from Apache+mod_fastcgi and start using RGW's Civetweb server
instead. Civetweb is much simpler, and future RGW optimizations are
all going into the Civetweb stack.

- Ken

On Tue, Dec 8, 2015 at 2:54 AM, Xav Paice  wrote:
> Hi,
>
> Just wondering if there's a known issue with http://gitbuilder.ceph.com/ -
> if I go to several urls, e.g.
> http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-trusty-x86_64-basic, I
> get a 403.  That's still the right place to get deb's, right?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

2015-12-08 Thread Scottix
I can confirm it seems to be kernels greater than 3.16, we had this problem
where servers would lock up and had to perform restarts on a weekly basis.
We downgraded to 3.16, since then we have not had to do any restarts.

I did find this thread in the XFS forums and I am not sure if has been
fixed or not
http://oss.sgi.com/archives/xfs/2015-07/msg00034.html


On Tue, Dec 8, 2015 at 2:06 AM Tom Christensen  wrote:

> We run deep scrubs via cron with a script so we know when deep scrubs are
> happening, and we've seen nodes fail both during deep scrubbing and while
> no deep scrubs are occurring so I'm pretty sure its not related.
>
>
> On Tue, Dec 8, 2015 at 2:42 AM, Benedikt Fraunhofer  > wrote:
>
>> Hi Tom,
>>
>> 2015-12-08 10:34 GMT+01:00 Tom Christensen :
>>
>> > We didn't go forward to 4.2 as its a large production cluster, and we
>> just
>> > needed the problem fixed.  We'll probably test out 4.2 in the next
>> couple
>>
>> unfortunately we don't have the luxury of a test cluster.
>> and to add to that, we couldnt simulate the load, altough it does not
>> seem to be load related.
>> Did you try running with nodeep-scrub as a short-term workaround?
>>
>> I'll give ~30% of the nodes 4.2 and see how it goes.
>>
>> > In our experience it takes about 2 weeks to start happening
>>
>> we're well below that. Somewhat between 1 and 4 days.
>> And yes, once one goes south, it affects the rest of the cluster.
>>
>> Thx!
>>
>>  Benedikt
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph snapshost

2015-12-08 Thread Jan Schermer
You don't really *have* to stop I/O.
In fact, I recommend you don't unless you have to.

The reason why this is recommended is to minimize the risk of data loss because 
the snapshot will be in a very similiar state as if you suddenly lost power to 
the server. Obviously if you need to have the same state of data in the 
snapshot (and we're talking about rollback of several seconds typically). For 
example if you append some data to a file and do a snapshot instantly the data 
will likely not be there (yet).

The reason why I recommend you don't do that is because this exposes problems 
with data consistency in the guest (applications/developers doing something 
stupid...) which is a good thing! If you suddenly lose power to your production 
database, you don't want to have to restore from backup. In an ACID compliant 
database all the data should simply be there no matter how harsh the shutdown 
was.

So unless you deliberately run your guests with disabled barriers/flushes*, or 
you need the absolute latest data, don't bother quiescing IO.
* in which case there's no guarantee even with fsfreeze

Jan

> On 09 Dec 2015, at 03:59, Yan, Zheng  wrote:
> 
> On Wed, Dec 9, 2015 at 12:10 AM, Dan Nica  
> wrote:
>> Hi guys,
>> 
>> 
>> 
>> So from documentation I must stop the I/O before taking rbd snapshots, how
>> do I do that or what does that mean ? do I have to unmount
>> 
> 
> see fsfreeze(8) command
> 
> 
>> the rbd image ?
>> 
>> 
>> 
>> --
>> 
>> Dan
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph 9.2 fails to install in COS 7.1.1503: Report and Fix

2015-12-08 Thread Goncalo Borges

Hi Cephers

This is just to report an issue (and a workaround) regarding 
dependencies in Centos 7.1.1503


Last week, I installed a couple of nodes and there were no issues with 
dependencies. This week, the installation of ceph rpm fails because it 
depends on gperftools-libs which, on its own, depends on libunwind.


Searching a bit, I've checked that my last week installs downloaded 
libunwind from epel (libunwind-1.1-10.el7.x86_64). Today it is no longer 
there.


Goggling about it, it seems libunwind will be available in CentOS 
7.2.1511 but for the current time, it should be available in Centos CR 
repos. For Centos 7.1.1503, it provides libunwind-1.1-5.el7.x86_64)


http://mirror.centos.org/centos/7.1.1503/cr

Cheers
Goncalo

--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cannot create Initial Monitor

2015-12-08 Thread Aakanksha Pudipeddi-SSI
I am still unable to get past this issue. Could anyone help me out here?

Thanks,
Aakanksha

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Aakanksha Pudipeddi-SSI
Sent: Thursday, December 03, 2015 8:08 PM
To: ceph-users
Subject: [ceph-users] Cannot create Initial Monitor

Hello Cephers,

I am unable to create the initial monitor during ceph cluster deployment. I do 
not know what changed since the same recipe used to work until very recently. 
These are the steps I used:
Ceph-deploy new  -- works
Dpkg -i -R  --works
Ceph-deploy mon create-initial - fails

Log:
[ceph_deploy.cli][INFO  ] Invoked (1.5.28): /usr/bin/ceph-deploy mon 
create-initial
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  subcommand: create-initial
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.cli][INFO  ]  keyrings  : None
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts myhost
[ceph_deploy.mon][DEBUG ] detecting platform for host myhost ...
[Myhost][DEBUG ] connection detected need for sudo
[Myhost][DEBUG ] connected to host: myhost
[Myhost][DEBUG ] detect platform information from remote host
[Myhost][DEBUG ] detect machine type
[Myhost][DEBUG ] find the location of an executable
[ceph_deploy.mon][INFO  ] distro info: Ubuntu 14.04 trusty
[Myhost][DEBUG ] determining if provided host has same hostname in remote
[Myhost][DEBUG ] get remote short hostname
[Myhost][DEBUG ] deploying mon to myhost
[Myhost][DEBUG ] get remote short hostname
[Myhost][DEBUG ] remote hostname: myhost
[Myhost][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[Myhost][DEBUG ] create the mon path if it does not exist
[Myhost][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-myhost/done
[Myhost][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-myhost/done
[Myhost][INFO  ] creating keyring file: 
/var/lib/ceph/tmp/ceph-myhost.mon.keyring
[Myhost][DEBUG ] create the monitor keyring file
[Myhost][INFO  ] Running command: sudo ceph-mon --cluster ceph --mkfs -i myhost 
--keyring /var/lib/ceph/tmp/ceph-myhost.mon.keyring
[Myhost][DEBUG ] ceph-mon: renaming mon.noname-a xx.xx.xxx.xx:6789/0 to 
mon.myhost
[Myhost][DEBUG ] ceph-mon: set fsid to 5573b0c6-02fd-4c45-aa89-b88fd08b3b87
[Myhost][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-myhost for 
mon.myhost
[Myhost][INFO  ] unlinking keyring file 
/var/lib/ceph/tmp/ceph-myhost.mon.keyring
[Myhost][DEBUG ] create a done file to avoid re-doing the mon deployment
[Myhost][DEBUG ] create the init path if it does not exist
[Myhost][DEBUG ] locating the `service` executable...
[Myhost][INFO  ] Running command: sudo initctl emit ceph-mon cluster=ceph 
id=myhost
[Myhost][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon 
/var/run/ceph/ceph-mon.myhost.asok mon_status
[Myhost][ERROR ] admin_socket: exception getting command descriptions: [Errno 
2] No such file or directory
[Myhost][WARNING] monitor: mon.myhost, might not be running yet

I checked the monitor log in /var/log/ceph and it does not have anything 
unusual, just the pid for the ceph-mon process. However, there is no 
/var/run/ceph/ceph-mon.myhost.asok. I do not know in which step this file is 
created; hence I am not able to debug this issue. Any pointers wrt this issue 
would be appreciated. I am using the BLKIN Ceph branch (wip-blkin) i.e. 9.0.1 
Ceph packages built from source and my ceph-deploy version is 1.5.28.

Thanks,
Aakanksha



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph snapshost

2015-12-08 Thread Yan, Zheng
On Wed, Dec 9, 2015 at 12:10 AM, Dan Nica  wrote:
> Hi guys,
>
>
>
> So from documentation I must stop the I/O before taking rbd snapshots, how
> do I do that or what does that mean ? do I have to unmount
>

see fsfreeze(8) command


> the rbd image ?
>
>
>
> --
>
> Dan
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cannot create Initial Monitor

2015-12-08 Thread Varada Kari
Could you try starting the monitor manually and see what the error is? Like 
ceph-mon -i  --cluster ceph &. Enable more logging (debug_mon).

Varada

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Aakanksha Pudipeddi-SSI
Sent: Wednesday, December 09, 2015 7:47 AM
To: Aakanksha Pudipeddi-SSI ; ceph-users 

Subject: Re: [ceph-users] Cannot create Initial Monitor

I am still unable to get past this issue. Could anyone help me out here?

Thanks,
Aakanksha

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Aakanksha Pudipeddi-SSI
Sent: Thursday, December 03, 2015 8:08 PM
To: ceph-users
Subject: [ceph-users] Cannot create Initial Monitor

Hello Cephers,

I am unable to create the initial monitor during ceph cluster deployment. I do 
not know what changed since the same recipe used to work until very recently. 
These are the steps I used:
Ceph-deploy new  -- works
Dpkg -i -R  --works
Ceph-deploy mon create-initial - fails

Log:
[ceph_deploy.cli][INFO  ] Invoked (1.5.28): /usr/bin/ceph-deploy mon 
create-initial
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  subcommand: create-initial
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.cli][INFO  ]  keyrings  : None
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts myhost
[ceph_deploy.mon][DEBUG ] detecting platform for host myhost ...
[Myhost][DEBUG ] connection detected need for sudo
[Myhost][DEBUG ] connected to host: myhost
[Myhost][DEBUG ] detect platform information from remote host
[Myhost][DEBUG ] detect machine type
[Myhost][DEBUG ] find the location of an executable
[ceph_deploy.mon][INFO  ] distro info: Ubuntu 14.04 trusty
[Myhost][DEBUG ] determining if provided host has same hostname in remote
[Myhost][DEBUG ] get remote short hostname
[Myhost][DEBUG ] deploying mon to myhost
[Myhost][DEBUG ] get remote short hostname
[Myhost][DEBUG ] remote hostname: myhost
[Myhost][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[Myhost][DEBUG ] create the mon path if it does not exist
[Myhost][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-myhost/done
[Myhost][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-myhost/done
[Myhost][INFO  ] creating keyring file: 
/var/lib/ceph/tmp/ceph-myhost.mon.keyring
[Myhost][DEBUG ] create the monitor keyring file
[Myhost][INFO  ] Running command: sudo ceph-mon --cluster ceph --mkfs -i myhost 
--keyring /var/lib/ceph/tmp/ceph-myhost.mon.keyring
[Myhost][DEBUG ] ceph-mon: renaming mon.noname-a xx.xx.xxx.xx:6789/0 to 
mon.myhost
[Myhost][DEBUG ] ceph-mon: set fsid to 5573b0c6-02fd-4c45-aa89-b88fd08b3b87
[Myhost][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-myhost for 
mon.myhost
[Myhost][INFO  ] unlinking keyring file 
/var/lib/ceph/tmp/ceph-myhost.mon.keyring
[Myhost][DEBUG ] create a done file to avoid re-doing the mon deployment
[Myhost][DEBUG ] create the init path if it does not exist
[Myhost][DEBUG ] locating the `service` executable...
[Myhost][INFO  ] Running command: sudo initctl emit ceph-mon cluster=ceph 
id=myhost
[Myhost][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon 
/var/run/ceph/ceph-mon.myhost.asok mon_status
[Myhost][ERROR ] admin_socket: exception getting command descriptions: [Errno 
2] No such file or directory
[Myhost][WARNING] monitor: mon.myhost, might not be running yet

I checked the monitor log in /var/log/ceph and it does not have anything 
unusual, just the pid for the ceph-mon process. However, there is no 
/var/run/ceph/ceph-mon.myhost.asok. I do not know in which step this file is 
created; hence I am not able to debug this issue. Any pointers wrt this issue 
would be appreciated. I am using the BLKIN Ceph branch (wip-blkin) i.e. 9.0.1 
Ceph packages built from source and my ceph-deploy version is 1.5.28.

Thanks,
Aakanksha



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd merge-diff error

2015-12-08 Thread Josh Durgin

On 12/08/2015 10:44 PM, Alex Gorbachev wrote:

Hi Josh,

On Mon, Dec 7, 2015 at 6:50 PM, Josh Durgin > wrote:

On 12/07/2015 03:29 PM, Alex Gorbachev wrote:

When trying to merge two results of rbd export-diff, the
following error
occurs:

iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151500
spin1/scrun1@autosnap120720151502
/data/volume1/scrun1-120720151502.bck

iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151504
spin1/scrun1@autosnap120720151504
/data/volume1/scrun1-120720151504.bck

iss@lab2-b1:~$ rbd merge-diff /data/volume1/scrun1-120720151502.bck
/data/volume1/scrun1-120720151504.bck
/data/volume1/mrg-scrun1-0204.bck
  Merging image diff: 11% complete...failed.
rbd: merge-diff error

That's all the output and I have found this link
http://tracker.ceph.com/issues/12911 but not sure if the patch
should
have already been in hammer or how to get it?


That patch fixed a bug that was only present after hammer, due to
parallelizing export-diff. You're likely seeing a different (possibly
new) issue.

Unfortunately there's not much output we can enable for export-diff in
hammer. Could you try running the command via gdb to figure out where
and why it's failing? Make sure you have librbd-dbg installed, then
send the output from gdb doing:

gdb --args rbd merge-diff /data/volume1/scrun1-120720151502.bck \
/data/volume1/scrun1-120720151504.bck /data/volume1/mrg-scrun1-0204.bck
break rbd.cc:1931
break rbd.cc:1935
break rbd.cc:1967
break rbd.cc:1985
break rbd.cc:1999
break rbd.cc:2008
break rbd.cc:2021
break rbd.cc:2053
break rbd.cc:2098
run
# (it will run now, stopping when it hits the error)
info locals


Will do - how does one load librbd-dbg?  I have the following on the system:

librbd-dev - RADOS block device client library (development files)
librbd1-dbg - debugging symbols for librbd1

is librbd1-dbg sufficient?


Yes, I just forgot the 1 in the package name.


Also a question - the merge-diff really stitches the to diff files
together, not really merges, correct? For example, in the following
workflow:

export-diff from full image - 10GB
export-diff from snap1 - 2 GB
export-diff from snap2 - 1 GB

My resulting merge export file would be 13GB, correct?


It does merge overlapping sections, i.e. part of snap1 that was
overwritten in snap2, so the merged diff may be smaller than the
original two.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd merge-diff error

2015-12-08 Thread Alex Gorbachev
Hi Josh,

On Mon, Dec 7, 2015 at 6:50 PM, Josh Durgin  wrote:

> On 12/07/2015 03:29 PM, Alex Gorbachev wrote:
>
>> When trying to merge two results of rbd export-diff, the following error
>> occurs:
>>
>> iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151500
>> spin1/scrun1@autosnap120720151502 /data/volume1/scrun1-120720151502.bck
>>
>> iss@lab2-b1:~$ rbd export-diff --from-snap autosnap120720151504
>> spin1/scrun1@autosnap120720151504 /data/volume1/scrun1-120720151504.bck
>>
>> iss@lab2-b1:~$ rbd merge-diff /data/volume1/scrun1-120720151502.bck
>> /data/volume1/scrun1-120720151504.bck /data/volume1/mrg-scrun1-0204.bck
>>  Merging image diff: 11% complete...failed.
>> rbd: merge-diff error
>>
>> That's all the output and I have found this link
>> http://tracker.ceph.com/issues/12911 but not sure if the patch should
>> have already been in hammer or how to get it?
>>
>
> That patch fixed a bug that was only present after hammer, due to
> parallelizing export-diff. You're likely seeing a different (possibly
> new) issue.
>
> Unfortunately there's not much output we can enable for export-diff in
> hammer. Could you try running the command via gdb to figure out where
> and why it's failing? Make sure you have librbd-dbg installed, then
> send the output from gdb doing:
>
> gdb --args rbd merge-diff /data/volume1/scrun1-120720151502.bck \
> /data/volume1/scrun1-120720151504.bck /data/volume1/mrg-scrun1-0204.bck
> break rbd.cc:1931
> break rbd.cc:1935
> break rbd.cc:1967
> break rbd.cc:1985
> break rbd.cc:1999
> break rbd.cc:2008
> break rbd.cc:2021
> break rbd.cc:2053
> break rbd.cc:2098
> run
> # (it will run now, stopping when it hits the error)
> info locals


Will do - how does one load librbd-dbg?  I have the following on the system:

librbd-dev - RADOS block device client library (development files)
librbd1-dbg - debugging symbols for librbd1

is librbd1-dbg sufficient?

Also a question - the merge-diff really stitches the to diff files
together, not really merges, correct? For example, in the following
workflow:

export-diff from full image - 10GB
export-diff from snap1 - 2 GB
export-diff from snap2 - 1 GB

My resulting merge export file would be 13GB, correct?

Thank you,
Alex

>
>
> Josh
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph extras package support for centos kvm-qemu

2015-12-08 Thread Ken Dreyer
When we re-arranged the download structure for packages and moved
everything from ceph.com to download.ceph.com, we did not carry
ceph-extras over.

The reason is that the packages there were unmaintained. The EL6 QEMU
binaries were vulnerable to VENOM (CVE-2015-3456) and maybe other
CVEs, and no users should rely on them any more.

If you need QEMU with RBD support on CentOS, I recommend that you
upgrade from CentOS 6 to CentOS 7.1+. Red Hat's QEMU package in RHEL
7.1 is built with librbd support.

On Thu, Nov 19, 2015 at 1:59 AM, Xue, Chendi  wrote:
> Hi, All
>
>
>
> We noticed ceph.com/packages url is no longer available, we used to download
> rbd supported centos qemu-kvm from http://ceph.com/packages/ceph-extras/rpm
> as intructed below.
>
>
>
> Is there another way to fix this? Or any other qemu-kvm version is rbd
> supported?
>
>
>
> [root@client03]# /usr/libexec/qemu-kvm --drive format=?
>
> Supported formats: raw cow qcow vdi vmdk cloop dmg bochs vpc vvfat qcow2 qed
> vhdx parallels nbd blkdebug host_cdrom host_floppy host_device file gluster
> gluster gluster gluster
>
>
>
> Sadly no rbd listed in the supported format
>
>
>
> Original URL we followed:
>
>
>
> [ceph-qemu]
>
> name=Ceph Packages for QEMU
>
> baseurl=http://ceph.com/packages/ceph-extras/rpm/{distro}/$basearch
>
> enabled=1
>
> priority=2
>
> gpgcheck=1
>
> type=rpm-md
>
> gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>
>
>
> [ceph-qemu-noarch]
>
> name=Ceph QEMU noarch
>
> baseurl=http://ceph.com/packages/ceph-extras/rpm/{distro}/noarch
>
> enabled=1
>
> priority=2
>
> gpgcheck=1
>
> type=rpm-md
>
> gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>
>
>
> [ceph-qemu-source]
>
> name=Ceph QEMU Sources
>
> baseurl=http://ceph.com/packages/ceph-extras/rpm/{distro}/SRPMS
>
> enabled=1
>
> priority=2
>
> gpgcheck=1
>
> type=rpm-md
>
> gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>
>
>
>
>
> Best Regards,
>
> Chendi
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD error

2015-12-08 Thread Brad Hubbard
+ceph-devel

- Original Message - 

> From: "Dan Nica" 
> To: ceph-us...@ceph.com
> Sent: Tuesday, 8 December, 2015 7:54:20 PM
> Subject: [ceph-users] OSD error

> Hi guys,

> Recently I installed ceph cluster version 9.2.0, and on my osd logs I see
> these errors:

> 2015-12-08 04:49:12.931683 7f42ec266700 -1 lsb_release_parse - pclose failed:
> (13) Permission denied
> 2015-12-08 04:49:12.955264 7f42ec266700 -1 lsb_release_parse - pclose failed:
> (13) Permission denied

> Do I have to worry about it ? what is generating these errors ?

Dan, What does "lsb_release -idrc" return on this system?

I wonder if we are getting hit with EINTR here maybe and getting SIGPIPE?

static void lsb_release_parse(map *m, CephContext *cct)
{
  FILE *fp = popen("lsb_release -idrc", "r");
  if (!fp) {
int ret = -errno;
lderr(cct) << "lsb_release_parse - failed to call lsb_release binary with 
error: " << cpp_strerror(ret) << dendl;
return;
  }

  char buf[512];
  while (fgets(buf, sizeof(buf) - 1, fp) != NULL) {
if (lsb_release_set(buf, "Distributor ID:", m, "distro"))
  continue;
if (lsb_release_set(buf, "Description:", m, "distro_description"))
  continue;
if (lsb_release_set(buf, "Release:", m, "distro_version"))
  continue;
if (lsb_release_set(buf, "Codename:", m, "distro_codename"))
  continue;

lderr(cct) << "unhandled output: " << buf << dendl;
  }

  if (pclose(fp)) {
int ret = -errno;
lderr(cct) << "lsb_release_parse - pclose failed: " << cpp_strerror(ret) << 
dendl;   <--HERE
  }
}

Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph new installation of ceph 0.9.2 issue and crashing osds

2015-12-08 Thread Brad Hubbard
Looks like it's failing to create a thread.

Try setting kernel.pid_max to 4194303 in /etc/sysctl.conf

Cheers,
Brad

- Original Message -
> From: "Kenneth Waegeman" 
> To: ceph-users@lists.ceph.com
> Sent: Tuesday, 8 December, 2015 10:45:11 PM
> Subject: [ceph-users] ceph new installation of ceph 0.9.2 issue and crashing  
> osds
> 
> Hi,
> 
> I installed ceph 0.9.2 on a new cluster of 3 nodes, with 50 OSDs on each
> node (300GB disks, 96GB RAM)
> 
> While installing, I got some issue that I even could not login as ceph
> user. So I increased some limits:
>   security/limits.conf
> 
> ceph-   nproc   1048576
> ceph-   nofile 1048576
> 
> I could then install the other OSDs.
> 
> After the cluster was installed, I added some extra pools. when creating
> the pgs of these pools, the osds of the cluster started to fail, with
> stacktraces. If I try to restart them, they keep on failing. I don't
> know if this is an actual bug of Infernalis, or a limit that is still
> not high enough.. I've increased the noproc and nofile entries even
> more, but no luck. Someone has a clue? Hereby the stacktraces I see:
> 
> Mostly this one:
> 
> -12> 2015-12-08 10:17:18.995243 7fa9063c5700  5 osd.12 pg_epoch: 904
> pg[3.3b(unlocked)] enter Initial
> -11> 2015-12-08 10:17:18.995279 7fa9063c5700  5 write_log with:
> dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615,
> dirty_divergent_priors: false, divergent_priors: 0, writeout_from:
> 4294967295'184467
> 44073709551615, trimmed:
> -10> 2015-12-08 10:17:18.995292 7fa9063c5700  5 osd.12 pg_epoch: 904
> pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904)
> [12,80,111] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive] exit Initial 0.48
> 0 0.00
>  -9> 2015-12-08 10:17:18.995301 7fa9063c5700  5 osd.12 pg_epoch: 904
> pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904)
> [12,80,111] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive] enter Reset
>  -8> 2015-12-08 10:17:18.995310 7fa9063c5700  5 osd.12 pg_epoch: 904
> pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904)
> [12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] exit Reset 0.08
> 1 0.17
>  -7> 2015-12-08 10:17:18.995326 7fa9063c5700  5 osd.12 pg_epoch: 904
> pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904)
> [12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] enter Started
>  -6> 2015-12-08 10:17:18.995332 7fa9063c5700  5 osd.12 pg_epoch: 904
> pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904)
> [12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] enter Start
>  -5> 2015-12-08 10:17:18.995338 7fa9063c5700  1 osd.12 pg_epoch: 904
> pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904)
> [12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] state: transi
> tioning to Primary
>  -4> 2015-12-08 10:17:18.995345 7fa9063c5700  5 osd.12 pg_epoch: 904
> pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904)
> [12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] exit Start 0.12
> 0 0.00
>  -3> 2015-12-08 10:17:18.995352 7fa9063c5700  5 osd.12 pg_epoch: 904
> pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904)
> [12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 inactive] enter Started/Primar
> y
>  -2> 2015-12-08 10:17:18.995358 7fa9063c5700  5 osd.12 pg_epoch: 904
> pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904)
> [12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 creating] enter Started/Primar
> y/Peering
>  -1> 2015-12-08 10:17:18.995365 7fa9063c5700  5 osd.12 pg_epoch: 904
> pg[3.3b( empty local-les=0 n=0 ec=904 les/c/f 0/904/0 904/904/904)
> [12,80,111] r=0 lpr=904 crt=0'0 mlcod 0'0 creating+peering] enter Starte
> d/Primary/Peering/GetInfo
>   0> 2015-12-08 10:17:18.998472 7fa9063c5700 -1 common/Thread.cc: In
> function 'void Thread::create(size_t)' thread 7fa9063c5700 time
> 2015-12-08 10:17:18.995438
> common/Thread.cc: 154: FAILED assert(ret == 0)
> 
>   ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x85) [0x7fa91924ebe5]
>   2: (Thread::create(unsigned long)+0x8a) [0x7fa91923325a]
>   3: (SimpleMessenger::connect_rank(entity_addr_t const&, int,
> PipeConnection*, Message*)+0x185) [0x7fa919229105]
>   4: (SimpleMessenger::get_connection(entity_inst_t const&)+0x3ba)
> [0x7fa9192298ea]
>   5: (OSDService::get_con_osd_cluster(int, unsigned int)+0x1ab)
> [0x7fa918c7318b]
>   6: (OSD::do_queries(std::map std::less, std::allocator >,
> std::less, std::allocator > > > > >&, std::shared_ptr)+0x1f1)
> [0x7fa918c9b061]
>   7: (OSD::dispatch_context(PG::RecoveryCtx&, PG*,
> std::shared_ptr, ThreadPool::TPHandle*)+0x142)
> [0x7fa918cb5832]
>   8: