Yes, I was dealing with an issue where OSD are not peerings, and I was trying
to see if force-create-pg can help recover the peering.
Data lose is an accepted possibility.
I hope this is what you are looking for ?
-3> 2018-01-31 22:47:22.942394 7fc641d0b700 5 mon.dl1-kaf101@0(electing)
e6 _ms_dispatch setting monitor caps on this connection
-2> 2018-01-31 22:47:22.942405 7fc641d0b700 5
mon.dl1-kaf101@0(electing).paxos(paxos recovering c 28110997..28111530)
is_readable = 0 - now=2018-01-31 22:47:22.942405 lease_expire=0.000000 has v0
lc 28111530
-1> 2018-01-31 22:47:22.942422 7fc641d0b700 5
mon.dl1-kaf101@0(electing).paxos(paxos recovering c 28110997..28111530)
is_readable = 0 - now=2018-01-31 22:47:22.942422 lease_expire=0.000000 has v0
lc 28111530
0> 2018-01-31 22:47:22.955415 7fc64350e700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/OSDMapMapping.h:
In function 'void OSDMapMapping::get(pg_t, std::vector<int>*, int*,
std::vector<int>*, int*) const' thread 7fc64350e700 time 2018-01-31
22:47:22.952877
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/OSDMapMapping.h:
288: FAILED assert(pgid.ps() < p->second.pg_num)
ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous
(stable)
--
Efficiency is Intelligent Laziness
On 2/2/18, 9:45 AM, "Sage Weil" <[email protected]> wrote:
On Fri, 2 Feb 2018, Frank Li wrote:
> Hi, I ran the ceph osd force-create-pg command in luminious 12.2.2 to
recover a failed pg, and it
> Instantly caused all of the monitor to crash, is there anyway to revert
back to an earlier state of the cluster ?
> Right now, the monitors refuse to come up, the error message is as
follows:
> I’ve filed a ceph ticket for the crash, but just wonder if there is a way
to get the cluster back up ?
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__tracker.ceph.com_issues_22847&d=DwIDaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=8-PrUTevTN6k7Tl3nH9Gm-Cd_teurkDKr3VHRc5ZqM4&m=nOL-K3EredRTMr3uV0U4iTOCflIKxQgqNo52DGEPY0w&s=2QqLfmo9DbNVtMebeV-jKg5RC4oVx4vcIXSC8vDB88A&e=
Can you includ the bit of the log a few lines up that includes the
assertion and file line number that failed?
Also, "during the course of trouble-shooting an osd issue" makes me
nervous: force-create-pg creates a new, *empty* PG when all copies of the
old one have been lost. Is that what you meant to do? It is essentially
telling the system to give up and accepting that there is data loss. Is
that what you meant?
Thanks!
sage
>
> --- begin dump of recent events ---
> 0> 2018-01-31 22:47:22.959665 7fc64350e700 -1 *** Caught signal
(Aborted) **
> in thread 7fc64350e700 thread_name:cpu_tp
>
> ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous
(stable)
> 1: (()+0x8eae11) [0x55f1113fae11]
> 2: (()+0xf5e0) [0x7fc64aafa5e0]
> 3: (gsignal()+0x37) [0x7fc647fca1f7]
> 4: (abort()+0x148) [0x7fc647fcb8e8]
> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x284) [0x55f1110fa4a4]
> 6: (()+0x2ccc4e) [0x55f110ddcc4e]
> 7: (OSDMonitor::update_creating_pgs()+0x98b) [0x55f11102232b]
> 8: (C_UpdateCreatingPGs::finish(int)+0x79) [0x55f1110777b9]
> 9: (Context::complete(int)+0x9) [0x55f110ed30c9]
> 10: (ParallelPGMapper::WQ::_process(ParallelPGMapper::Item*,
ThreadPool::TPHandle&)+0x7f) [0x55f111204e1f]
> 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa8e) [0x55f111100f1e]
> 12: (ThreadPool::WorkThread::entry()+0x10) [0x55f111101e00]
> 13: (()+0x7e25) [0x7fc64aaf2e25]
> 14: (clone()+0x6d) [0x7fc64808d34d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
>
> --
> Efficiency is Intelligent Laziness
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com