Hi Sergey,

I forgot this patch until now. Sorry about that.

On Mon, Feb 01, 2016 at 10:02:48AM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (01/29/16 16:28), Minchan Kim wrote:
> > Hello Sergey,
> > 
> > Sorry to late response. Thesedays, I'm really busy with personal
> > stuff.
> 
> sure, no worries :)
> 
> > On Tue, Jan 26, 2016 at 09:03:59PM +0900, Sergey Senozhatsky wrote:
> > > I've been asked several very simple questions:
> > > a) How can I ensure that zram uses (or used) several compression
> > >    streams?
> > 
> > Why does he want to ensure several compression streams?
> > As you know well, zram handle it dynamically.
> > 
> > If zram cannot allocate more streams, it means the system is
> > heavily fragmented or memory pressure at that time so there
> > is no worth to add more stream, I think.
> > 
> > Could you elaborate it more why he want to know it and what
> > he expect from that?
> 
> good questions. I believe mostly it's about fine-tuning on a
> per-device basis, which is getting especially tricky when zram
> devices are used as a sort of in-memory tmp storage for various
> applications (black boxen).
> 
> > > b) What is the current number of comp streams (how much memory
> > >    does zram *actually* use for compression streams, if there are
> > >    more than one stream)?
> > 
> > Hmm, in the kernel, there are lots of example subsystem
> > we cannot know exact memory usage. Why does the user want
> > to know exact memory usage of zram? What is his concern?
> 
> certainly true. probably some of those sub-systems/drivers have some
> sort of LRU, or shrinker callbacks, to release unneeded memory back.
> zram only allocates streams, and it basically hard to tell how many:
> up to max_comp_streams, which can be larger than the number of cpus
> on the system; because we keep preemption enabled (I didn't realize
> that until I played with the patch) around
> zcomp_strm_find()/zcomp_strm_release():
> 
>       zram_bvec_write()
>       {
>               ...
>               zstrm = zcomp_strm_find(zram->comp);
> >> can preempt
>               user_mem = kmap_atomic(page);
> >> now atomic
>               zcomp_compress()
>               ...
>               kunmap_atomic()
> >> can preempt
>               zcomp_strm_release()
>               ...
>       }
> 
> so how many streams I can have on my old 4-cpus x86_64 box?
> 
> 10?
> yes.
> 
> # cat /sys/block/zram0/mm_stat
> 630484992  9288707 13103104        0 13103104    16240        0       10
> 
> 16?
> yes.
> 
> # cat /sys/block/zram0/mm_stat
> 1893117952 25296718 31354880        0 31354880    15342        0       16
> 
> 21?
> yes.
> 
> # cat /sys/block/zram0/mm_stat
> 1893167104 28499936 46616576        0 46616576    15330        0       21
> 
> do I need 21? may be no. do I nede 18? if 18 streams are needed only 10%
> of the time (I can figure it out by doing repetitive cat zramX/mm_stat),
> then I can set max_comp_streams to make 90% of applications happy, e.g.
> max_comp_streams to 10, and save some memory.
> 

Okay. Let's go back to zcomp design decade. As you remember, the reason
we separated single and multi stream code was performance caused by
locking scheme(ie, mutex_lock in single stream model was really fast
than sleep/wakeup model in multi stream).
If we could overcome that problem back then, we should have gone to
multi stream code default.

How about using *per-cpu* streams?

I remember you wanted to create max number of comp streams statically
although I didn't want at that time but I change my decision.

Let's allocate comp stream statically but remove max_comp_streams
knob. Instead, by default, zram alloctes number of streams according
to the number of online CPU.

So I think we can solve locking scheme issue in single stream
, guarantee parallel level as well as enhancing performance with
no locking.

Downside with the approach is that unnecessary memory space reserve
although zram might be used 1% of running system time. But we
should give it up for other benefits(ie, simple code, removing
max_comp_streams knob, no need to this your stat, guarantee parallel
level, guarantee consumed memory space).

What do you think about it?

Reply via email to