Hi,
We have a standalone ceph cluster v13.2.6 and wanted to replicate it to another
DC. After going through "Migrating a Single Site
System to Multi-Site" and "Configure a Secondary Zone" from
http://docs.ceph.com/docs/master/radosgw/multisite/, We have setted
up all buckets to "disable replication" and started replication. To our suprise
after a few minutes from start a new pools named
default.rgw.buckets.{index,data} appeared and started getting data.
There was a data split in the indexes pool, like below:
dc2_zone.rgw.control 35 0 B 0 118 TiB
8
dc2_zone.rgw.meta 36 714 KiB 0 118 TiB
2895
dc2_zone.rgw.log 37 14 KiB 0 118 TiB
734
dc2_zone.rgw.buckets.index 38 0 B 0 565 GiB
7203
default.rgw.buckets.index 39 0 B 0 565 GiB
4204
dc2_zone.rgw.buckets.data 40 933 MiB 0 118 TiB
2605
Idexes on a secondary pool was inconsistent.
In logs from radosgw setted as an enpoint for secondary zone we found those
lines:
-10001> 2019-06-14 11:41:45.701 7f46f0959700 -1 *** Caught signal (Segmentation
fault) **
in thread 7f46f0959700 thread_name:data-sync
ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
1: (()+0xf5d0) [0x7f4739e1c5d0]
2: (RGWCoroutine::set_sleeping(bool)+0xc) [0x5561c28ffe0c]
3: (RGWOmapAppend::flush_pending()+0x2d) [0x5561c2904e1d]
4: (RGWOmapAppend::finish()+0x10) [0x5561c2904f00]
5: (RGWDataSyncShardCR::stop_spawned_services()+0x30) [0x5561c2b44320]
6: (RGWDataSyncShardCR::incremental_sync()+0x4c6) [0x5561c2b5d736]
7: (RGWDataSyncShardCR::operate()+0x75) [0x5561c2b5f0e5]
8: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x46) [0x5561c28fd566]
9: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*,
std::allocator<RGWCoroutinesStack*> >&)+0x293) [0x5561c2900233]
10: (RGWCoroutinesManager::run(RGWCoroutine*)+0x78) [0x5561c2901108]
11: (RGWRemoteDataLog::run_sync(int)+0x1e7) [0x5561c2b36d37]
12: (RGWDataSyncProcessorThread::process()+0x46) [0x5561c29bacb6]
13: (RGWRadosThread::Worker::entry()+0x22b) [0x5561c295c4cb]
14: (()+0x7dd5) [0x7f4739e14dd5]
15: (clone()+0x6d) [0x7f472e306ead]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
Right now We workaround this by setting pool names in secondary zone to
default.* and everything looks fine so We are gradually
enabling replication for other buckets and We are observing situation.
Has anyone seen a similar beahaviour?
Best Regards,
Tomasz Płaza
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com