On 2025/4/22 18:29, Mikulas Patocka wrote:
Hi
On Thu, 17 Apr 2025, Dongsheng Yang wrote:
+ccing md-devel
On 2025/4/16 23:10, Jens Axboe wrote:
On 4/16/25 12:08 AM, Dongsheng Yang wrote:
On 2025/4/16 9:04, Jens Axboe wrote:
On 4/15/25 12:00 PM, Dan Williams wrote:
Thanks for making the comparison chart. The immediate question this
raises is why not add "multi-tree per backend", "log structured
writeback", "readcache", and "CRC" support to dm-writecache?
device-mapper is everywhere, has a long track record, and enhancing it
immediately engages a community of folks in this space.
Strongly agree.
Hi Dan and Jens,
Thanks for your reply, that's a good question.
1. Why not optimize within dm-writecache?
From my perspective, the design goal of dm-writecache is to be a
minimal write cache. It achieves caching by dividing the cache device
into n blocks, each managed by a wc_entry, using a very simple
management mechanism. On top of this design, it's quite difficult to
implement features like multi-tree structures, CRC, or log-structured
writeback. Moreover, adding such optimizations?especially a read
cache?would deviate from the original semantics of dm-writecache. So,
we didn't consider optimizing dm-writecache to meet our goals.
2. Why not optimize within bcache or dm-cache?
As mentioned above, dm-writecache is essentially a minimal write
cache. So, why not build on bcache or dm-cache, which are more
complete caching systems? The truth is, it's also quite difficult.
These systems were designed with traditional SSDs/NVMe in mind, and
many of their design assumptions no longer hold true in the context of
PMEM. Every design targets a specific scenario, which is why, even
with dm-cache available, dm-writecache emerged to support DAX-capable
PMEM devices.
3. Then why not implement a full PMEM cache within the dm framework?
In high-performance IO scenarios?especially with PMEM hardware?adding
an extra DM layer in the IO stack is often unnecessary. For example,
DM performs a bio clone before calling __map_bio(clone) to invoke the
target operation, which introduces overhead.
Device mapper performs (in the common fast case) one allocation per
incoming bio - the allocation contains the outgoing bio and a structure
that may be used for any purpose by the target driver. For interlocking,
it uses RCU, so there is no synchronizing instruction. So, DM overhead is
not big.
Thank you again for the suggestion. I absolutely agree that leveraging
existing frameworks would be helpful in terms of code review, and
merging. I, more than anyone, hope more people can help review the
code or join in this work. However, I believe that in the long run,
building a standalone pcache module is a better choice.
I think we'd need much stronger reasons for NOT adopting some kind of dm
approach for this, this is really the place to do it. If dm-writecache
etc aren't a good fit, add a dm-whatevercache for it? If dm is
unnecessarily cloning bios when it doesn't need to, then that seems like
something that would be worthwhile fixing in the first place, or at
least eliminate for cases that don't need it. That'd benefit everyone,
and we would not be stuck with a new stack to manage.
Would certainly be worth exploring with the dm folks.
well, introducing dm-pcache (assuming we use this name) could, on one hand,
attract more users and developers from the device-mapper community to pay
attention to this project, and on the other hand, serve as a way to validate
or improve the dm framework’s performance in high-performance I/O scenarios.
If necessary, we can enhance the dm framework instead of bypassing it
entirely. This indeed sounds like something that would “benefit everyone.”
Hmm, I will seriously consider this approach.
Hi Alasdair, Mike, Mikulas, Do you have any suggestions?
Thanx
If you create a new self-contained target that doesn't need changes in the
generic dm or block code, it's OK and I would accept that.
I will try to port pcache into dm to be a new self-contained target.
Thanx
Dongsheng
Improving dm-writecache is also possible.
Mikulas