On (04/26/17 09:52), js1...@gmail.com wrote: [..] > +struct zram_hash { > + spinlock_t lock; > + struct rb_root rb_root; > };
just a note. we can easily have N CPUs spinning on ->lock for __zram_dedup_get() lookup, which can invole a potentially slow zcomp_decompress() [zlib, for example, with 64k pages] and memcmp(). the larger PAGE_SHIFT is, the more serialized IOs become. in theory, at least. CPU0 CPU1 ... CPUN __zram_bvec_write() __zram_bvec_write() __zram_bvec_write() zram_dedup_find() zram_dedup_find() zram_dedup_find() spin_lock(&hash->lock); spin_lock(&hash->lock); spin_lock(&hash->lock); __zram_dedup_get() zcomp_decompress() ... so may be there is a way to use read-write lock instead on spinlock for hash and reduce write/read IO serialization. -ss