[Bug 1936136] [NEW] ceph on bcache performance regression

2021-07-13 Thread dongdong tao
Public bug reported: Ceph on bcache could have serious performance degradation (10 times drop) when the below two conditions are met: 1. bluefs_buffered_io is turned on 2. Any OSD bcache’s cache_available_percent is less than 60 As many of us may already know that bcache will force all writes

[Bug 1914911] Re: [SRU] bluefs doesn't compact log file

2021-04-28 Thread dongdong tao
Verified the bionic-proposed ceph package, can confirm the bluefs compaction performed even with a very low workload. ** Tags removed: verification-needed verification-needed-bionic ** Tags added: verification-done verification-done-bionic -- You received this bug notification because you are a

[Bug 1900438] Re: Bcache bypasse writeback on caching device with fragmentation

2021-04-27 Thread dongdong tao
I used the same steps to verify it and can confirm it's succeeded in bionic-proposed too fio --name=test1 --filename /dev/bcache0 --direct 1 --rw=randrw --bs=32k,4k --ioengine=libaio --iodepth=1 Then I monitoring the cache_available_percent and writeback rate. I no longer see the

[Bug 1900438] Re: Bcache bypasse writeback on caching device with fragmentation

2021-04-27 Thread dongdong tao
** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1900438 Title: Bcache bypasse writeback on caching device with fragmentation To manage notifications about

[Bug 1900438] Re: Bcache bypasse writeback on caching device with fragmentation

2021-04-21 Thread dongdong tao
I used the same steps to verify it and can conconfirm it's succeeded fio --name=test1 --filename /dev/bcache0 --direct 1 --rw=randrw --bs=32k,4k --ioengine=libaio --iodepth=1 Then I monitoring the cache_available_percent and writeback rate. I no longer see the cache_available_percent dropped to

[Bug 1900438] Re: Bcache bypasse writeback on caching device with fragmentation

2021-04-20 Thread dongdong tao
I've done the verification of the focal-proposed kernel. I used below fio command to cause the bcache fragmentation: fio --name=test1 --filename /dev/bcache0 --direct 1 --rw=randrw --bs=32k,4k --ioengine=libaio --iodepth=1 Then I monitoring the cache_available_percent and writeback rate. I no

[Bug 1900438] Re: Bcache bypasse writeback on caching device with fragmentation

2021-03-30 Thread dongdong tao
** Changed in: linux (Ubuntu Bionic) Importance: Medium => High ** Changed in: linux (Ubuntu Focal) Importance: Medium => High ** Changed in: linux (Ubuntu Groovy) Importance: Medium => High -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1900438] Re: Bcache bypasse writeback on caching device with fragmentation

2021-03-26 Thread dongdong tao
** Also affects: linux (Ubuntu Hirsute) Importance: Undecided Assignee: dongdong tao (taodd) Status: Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1900438 Title: Bcache

[Bug 1919289] [NEW] OSDMapMapping does not handle active.size() > pool size

2021-03-16 Thread dongdong tao
Public bug reported: If the pool size is reduced, we can end up with pg_temp mappings that are too big. This can trigger bad behavior elsewhere (e.g., OSDMapMapping, which assumes that acting and up are always <= pool size). ** Affects: ceph (Ubuntu) Importance: Undecided Status:

[Bug 1900438] Re: Bcache bypasse writeback on caching device with fragmentation

2021-03-14 Thread dongdong tao
Focal) Importance: Undecided Status: New ** Changed in: linux (Ubuntu) Assignee: (unassigned) => dongdong tao (taodd) ** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => dongdong tao (taodd) ** Changed in: linux (Ubuntu Focal) Assignee: (unas

[Bug 1917288] [NEW] Missing to package ceph-kvstore-tool, ceph-monstore-tool, ceph-osdomap-tool in bionic-train UCA release

2021-02-28 Thread dongdong tao
Public bug reported: ceph-kvstore-tool, ceph-monstore-tool, ceph-osdomap-tool were shipped within ceph-test package,but the ceph-test package was dropped by [0] in bionic-train UCA release. I believe the reason is that most of the binaries (except those 3 tools) in ceph-test package are meant

[Bug 1914911] [NEW] bluefs doesn't compact log file

2021-02-07 Thread dongdong tao
Public bug reported: For a certain type of workload, the bluefs might never compact the log file, which would cause the bluefs log file slowly grows to a huge size (some bigger than 1TB for a 1.5T device). This bug could eventually cause osd crash and failed to restart as it couldn't get

[Bug 1900438] Re: Bcache bypasse writeback on caching device with fragmentation

2021-01-14 Thread dongdong tao
Here is the latest updated patch and the progress being made in upstream https://marc.info/?l=linux-bcache=160981605206306=1 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1900438 Title: Bcache

[Bug 1906496] [NEW] mgr can be very slow in a large ceph cluster

2020-12-02 Thread dongdong tao
Public bug reported: upstream implemented a new feature [1] that will check/report those long network ping times between osds, but it introduced an issue that ceph- mgr might be very slow because it needs to dump all the new osd network ping stats [2] for some tasks, this can be bad especially

[Bug 1900438] Re: Bcache bypasse writeback on caching device with fragmentation

2020-11-03 Thread dongdong tao
I've just submitted a patch[1] to upstream for review to help with this problem. The key is to speed up the writeback rate when the fragmentation is high. Here are the comments from the patch: Current way to calculate the writeback rate only considered the dirty sectors, this usually works fine

[Bug 1868364] Re: [SRU] rgw: unable to abort multipart upload after the bucket got resharded

2020-08-26 Thread dongdong tao
** Tags removed: verification-needed ** Tags added: verification-done -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1868364 Title: [SRU] rgw: unable to abort multipart upload after the bucket got

[Bug 1868364] Re: [SRU] rgw: unable to abort multipart upload after the bucket got resharded

2020-08-21 Thread dongdong tao
Please verify if the " Autopkgtest regression report (ceph/12.2.13-0ubuntu0.18.04.3)" is an issue or not ? ** Tags removed: verification-done ** Tags added: verification-needed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1868364] Re: [SRU] rgw: unable to abort multipart upload after the bucket got resharded

2020-08-21 Thread dongdong tao
** Tags removed: verification-needed verification-needed-bionic ** Tags added: verification-bionic-done verification-done ** Tags removed: verification-queens-needed ** Tags added: verification-queens-done -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1868364] Re: [SRU] rgw: unable to abort multipart upload after the bucket got resharded

2020-08-21 Thread dongdong tao
Hi all, I've verified that the ceph package 12.2.13-0ubuntu0.18.04.3 ( bionic-proposed) fixed the problem. The steps I've done: 1. Deploy a ceph cluster with version 12.2.13-0ubuntu0.18.04.2 2. s3cmd mb s3://test 3. s3cmd put testfile s3://test //400MB testfile 4. Ctrl + C to abort the

[Bug 1868364] Re: [SRU] rgw: unable to abort multipart upload after the bucket got resharded

2020-08-19 Thread dongdong tao
Hi Robie and Corey, is this autopkgtest regression expected ? This is not caused by my change to rgw, do you need to re-upload the pkg or re-run the regression ? Thanks, Dongdong -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1868364] Re: rgw: unable to abort multipart upload after the bucket got resharded

2020-08-11 Thread dongdong tao
This is only intended for Bionic ceph12.2.13. This debdiff file is generated via debdiff . Let me upload a real patch here, so it would be more clear. ** Patch added: "bug1868364.patch" https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1868364/+attachment/5400889/+files/bug1868364.patch

[Bug 1868364] Re: rgw: unable to abort multipart upload after the bucket got resharded

2020-08-11 Thread dongdong tao
Hi Corey, I updated the regression potential section, and this patch is for bionic luminous. Thanks, Dongdong ** Description changed: [Impact] This bug will cause the bucket not able to abort the multipart upload and leaving the stale multiple entries behind for those buckets which had

[Bug 1868364] Re: rgw: unable to abort multipart upload after the bucket got resharded

2020-07-06 Thread dongdong tao
upload the debdiff ** Description changed: + [Impact] + This bug will cause the bucket not able to abort the multipart upload and leaving the stale multiple entries behind for those buckets which had partial multipart uploads before the resharding. + + [Test Case] + Deploy a latest

[Bug 1804261] Re: Ceph OSD units requires reboot if they boot before vault (and if not unsealed with 150s)

2020-06-15 Thread dongdong tao
Just to clarify a bit to avoid confusion. In above comment, at step 5, I meant Wait for about 1.5 hour and unseal the vault. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1804261 Title: Ceph OSD

[Bug 1804261] Re: Ceph OSD units requires reboot if they boot before vault (and if not unsealed with 150s)

2020-06-14 Thread dongdong tao
I have verified the fix in bionic-proposed and confirm it can fix this issue. The test steps I've performed: 1. deployed a ceph cluster with vault 2. upgrade some of the osds to 12.2.13 3. Add "Environment=CEPH_VOLUME_SYSTEMD_TRIES=2000" at /lib/systemd/system/ceph-volume@.service for all osds

[Bug 1804261] Re: Ceph OSD units requires reboot if they boot before vault (and if not unsealed with 150s)

2020-06-04 Thread dongdong tao
Hi All, I can confirm this release fixed the bug, I used below steps to test 1. Deployed a ceph cluster with vault 2. Upgrade all the ceph packages to 12.2.13 at bionic-proposed 3. Add "Environment=CEPH_VOLUME_SYSTEMD_TRIES=2000" at /lib/systemd/system/ceph-volume@.service for some osd node 4.

[Bug 1868364] [NEW] backport the multipart fix to luminous

2020-03-21 Thread dongdong tao
/pull/32617 upstream bug report: https://tracker.ceph.com/issues/43583 ** Affects: ceph (Ubuntu) Importance: Undecided Assignee: dongdong tao (taodd) Status: New ** Changed in: ceph (Ubuntu) Assignee: (unassigned) => dongdong tao (taodd) -- You received this bug notificat

[Bug 1863704] Re: wrongly used a string type as int value for CEPH_VOLUME_SYSTEMD_TRIES and CEPH_VOLUME_SYSTEMD_INTERVAL

2020-02-28 Thread dongdong tao
** Also affects: ceph (Ubuntu Focal) Importance: High Assignee: dongdong tao (taodd) Status: New ** Changed in: ceph (Ubuntu Focal) Importance: High => Medium ** Changed in: ceph (Ubuntu Focal) Status: New => Fix Released -- You received this bug notification b

[Bug 1863704] Re: wrongly used a string type as int value for CEPH_VOLUME_SYSTEMD_TRIES and CEPH_VOLUME_SYSTEMD_INTERVAL

2020-02-19 Thread dongdong tao
proposed a cosmic debdiff ** Patch added: "cosmic.debdiff" https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1863704/+attachment/5329514/+files/cosmic.debdiff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1863704] Re: wrongly used a string type as int value for CEPH_VOLUME_SYSTEMD_TRIES and CEPH_VOLUME_SYSTEMD_INTERVAL

2020-02-19 Thread dongdong tao
proposed a disco debdiff ** Patch added: "disco.debdiff" https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1863704/+attachment/5329513/+files/disco.debdiff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1863704] Re: wrongly used a string type as int value for CEPH_VOLUME_SYSTEMD_TRIES and CEPH_VOLUME_SYSTEMD_INTERVAL

2020-02-19 Thread dongdong tao
proposed a eoan debdiff ** Patch added: "eoan.debdiff" https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1863704/+attachment/5329512/+files/eoan.debdiff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1863704] Re: wrongly used a string type as int value for CEPH_VOLUME_SYSTEMD_TRIES and CEPH_VOLUME_SYSTEMD_INTERVAL

2020-02-19 Thread dongdong tao
proposed a bionic debdiff ** Patch added: "bionic.debdiff" https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1863704/+attachment/5329515/+files/bionic.debdiff ** Also affects: ceph (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: ceph (Ubuntu Eoan)

[Bug 1863704] Re: wrongly used a string type as int value for CEPH_VOLUME_SYSTEMD_TRIES and CEPH_VOLUME_SYSTEMD_INTERVAL

2020-02-19 Thread dongdong tao
@Edward, focal does contain the fix , need targeting to eoan disco bionic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863704 Title: wrongly used a string type as int value for

[Bug 1863704] Re: wrongly used a string type as int value for CEPH_VOLUME_SYSTEMD_TRIES and CEPH_VOLUME_SYSTEMD_INTERVAL

2020-02-18 Thread dongdong tao
** No longer affects: ceph (Ubuntu Bionic) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863704 Title: wrongly used a string type as int value for CEPH_VOLUME_SYSTEMD_TRIES and

[Bug 1863704] Re: wrongly used a string type as int value for CEPH_VOLUME_SYSTEMD_TRIES and CEPH_VOLUME_SYSTEMD_INTERVAL

2020-02-18 Thread dongdong tao
** Also affects: ceph (Ubuntu Bionic) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863704 Title: wrongly used a string type as int value for

[Bug 1863704] Re: wrongly used a string type as int value for CEPH_VOLUME_SYSTEMD_TRIES and CEPH_VOLUME_SYSTEMD_INTERVAL

2020-02-18 Thread dongdong tao
This is a deb patch that address this issue ** Patch added: "0001-ceph-volume-fix-the-type-mismatch-covert-the-tries-a.patch" https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1863704/+attachment/5329167/+files/0001-ceph-volume-fix-the-type-mismatch-covert-the-tries-a.patch ** Tags added:

[Bug 1863704] [NEW] wrongly used a string type as int value for CEPH_VOLUME_SYSTEMD_TRIES and CEPH_VOLUME_SYSTEMD_INTERVAL

2020-02-17 Thread dongdong tao
as the previous value was apparently wrong and might wrongly enlarged. [other info] Upstream bug report: https://tracker.ceph.com/issues/43186 Upstream pull request: https://github.com/ceph/ceph/pull/32106 ** Affects: ceph (Ubuntu) Importance: High Assignee: dongdong tao (taodd) Status

[Bug 1861793] Re: [SRU] ceph 12.2.13

2020-02-11 Thread dongdong tao
Hi James, I'm wondering if you can help to include a simple fix to this SRU for bug https://bugs.launchpad.net/charm-ceph-osd/+bug/1804261 The fix is here: https://github.com/ceph/ceph/pull/32106, it's a critical ceph-volume bug that found by you. it is now in upstream but not merged to

[Bug 1798081] Re: ceph got slow request because of primary osd drop messages

2018-12-04 Thread dongdong tao
Hi Brian, I have verified trusty-proposed, it works good! I used the second method (the gdb one) to verify it, below is the message I got: 231007:2018-12-04 13:54:53.828152 7ffa7658b700 10 -- 10.5.0.13:6801/15728 >> 10.5.0.20:6802/1017937 pipe(0x7ffa96c8a280 sd=146 :6801 s=2 pgs=3 cs=1 l=0

[Bug 1798081] Re: ceph got slow request because of primary osd drop messages

2018-11-27 Thread dongdong tao
Xenial has Jewel ceph, Jewel have that patch. so this issue does not affect Xenial ubuntu or releases later then Xenial. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1798081 Title: ceph got slow

[Bug 1798081] Re: ceph got slow request because of primary osd drop messages

2018-11-20 Thread dongdong tao
** Description changed: [Impact] Ceph from ubuntu trusty, the version is 0.80.*. The bug is that when a message seq number has exceeds the max value of unsigned 32 bit which is 4294967295, the unsigned 64 bit seq number will be truncated to unsigned 32 bit. But the seq number is supposed

[Bug 1798081] Re: ceph got slow request because of primary osd drop messages

2018-11-19 Thread dongdong tao
** Patch added: "lp1798081_trusty.debdiff" https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1798081/+attachment/5214357/+files/lp1798081_trusty.debdiff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1798081] Re: ceph got slow request because of primary osd drop messages

2018-11-12 Thread dongdong tao
** Description changed: [Impact] Ceph from ubuntu trusty, the version is 0.80.*. The bug is that when a message seq number has exceeds the max value of unsigned 32 bit which is 4294967295, the unsigned 64 bit seq number will be truncated to unsigned 32 bit. But the seq number is supposed

[Bug 1798081] Re: ceph got slow request because of primary osd drop messages

2018-11-12 Thread dongdong tao
** Patch added: "lp1798081_trusty.debdiff" https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1798081/+attachment/5211686/+files/lp1798081_trusty.debdiff ** Patch removed: "lp1798081.debdiff" https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1798081/+attachment/5211680/+files/deb.diff

[Bug 1798081] Re: ceph got slow request because of primary osd drop messages

2018-11-12 Thread dongdong tao
This is the debdiff file ** Patch added: "deb.diff" https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1798081/+attachment/5211680/+files/deb.diff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1798081] Re: ceph got slow request because of primary osd drop messages

2018-11-12 Thread dongdong tao
** Description changed: - ceph version 0.80.5 - the reason that primary osd drop the secondary osd's subop reply message is because of - a bug that the message sequence number is truncated to unsigned 32 bit from unsigned 64 bit. + [Impact] + Ceph from ubuntu trusty, the version is 0.80.*. +

[Bug 1798081] [NEW] ceph got slow request because of primary osd drop messages

2018-10-16 Thread dongdong tao
Public bug reported: ceph version 0.80.5 the reason that primary osd drop the secondary osd's subop reply message is because of a bug that the message sequence number is truncated to unsigned 32 bit from unsigned 64 bit. the bug is already reported in the upstream: