Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-03 Thread Sergey Senozhatsky
On (05/03/16 15:19), Minchan Kim wrote:
> > > > cat /sys/xxx/max_comp_stream returns num_online_cpus.
> > > 
> > > One more thing,
> > > 
> > > User:
> > > echo 4 > /sys/xxx/max_comp_stream"
> > > cat /sys/xxx/max_comp_streams
> > > 8
> > 
> > sure, it can also be
> > 
> > cat /sys/xxx/max_comp_streams
> > 5
> > cat /sys/xxx/max_comp_streams
> > 6
> > cat /sys/xxx/max_comp_streams
> > 7
> > cat /sys/xxx/max_comp_streams
> > 3
> > 
> > depending on the availability of CPUs. but why would user space
> > constantly check max_comp_streams?
> > 
> > > which is rather weird?
> > > 
> > > We should keep user's value and return it to user although it's techically
> > > lying. IMO, it would be best way to prevent confusing for user until we
> > > removes max_comp_streams finally.
> > 
> > well, I preferred to show the actual state of the device. besides,
> > does anyone really do
> > 
> > write buffer to file
> > if (success)
> > read from file and compare with the buffer
> > 
> > ?
> > 
> 
> Okay, I want to go with your approach!

thanks. I mean that was my thinking when I decided to change the
max_comp_streams output. but no pressure, it does change the numbers
that user space will see. don't have any strong opinion, can keep it
as zcomp cleanup only -- w/o touching the _show()/_store() parts.

> Could you update zram.txt to reflect it?

will do later today. I think I'd prefer to keep it as independent
patch, since it does change the user visible behaviour after all (no
idea if it's true tho; can't easily think why would anyone keep track
of the values returned by cat /sys/xxx/max_comp_streams), so we can
revert it w/o reverting the per-cpu streams IF anyone/anything will
get upset with the change.

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-03 Thread Sergey Senozhatsky
On (05/03/16 15:19), Minchan Kim wrote:
> > > > cat /sys/xxx/max_comp_stream returns num_online_cpus.
> > > 
> > > One more thing,
> > > 
> > > User:
> > > echo 4 > /sys/xxx/max_comp_stream"
> > > cat /sys/xxx/max_comp_streams
> > > 8
> > 
> > sure, it can also be
> > 
> > cat /sys/xxx/max_comp_streams
> > 5
> > cat /sys/xxx/max_comp_streams
> > 6
> > cat /sys/xxx/max_comp_streams
> > 7
> > cat /sys/xxx/max_comp_streams
> > 3
> > 
> > depending on the availability of CPUs. but why would user space
> > constantly check max_comp_streams?
> > 
> > > which is rather weird?
> > > 
> > > We should keep user's value and return it to user although it's techically
> > > lying. IMO, it would be best way to prevent confusing for user until we
> > > removes max_comp_streams finally.
> > 
> > well, I preferred to show the actual state of the device. besides,
> > does anyone really do
> > 
> > write buffer to file
> > if (success)
> > read from file and compare with the buffer
> > 
> > ?
> > 
> 
> Okay, I want to go with your approach!

thanks. I mean that was my thinking when I decided to change the
max_comp_streams output. but no pressure, it does change the numbers
that user space will see. don't have any strong opinion, can keep it
as zcomp cleanup only -- w/o touching the _show()/_store() parts.

> Could you update zram.txt to reflect it?

will do later today. I think I'd prefer to keep it as independent
patch, since it does change the user visible behaviour after all (no
idea if it's true tho; can't easily think why would anyone keep track
of the values returned by cat /sys/xxx/max_comp_streams), so we can
revert it w/o reverting the per-cpu streams IF anyone/anything will
get upset with the change.

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-03 Thread Sergey Senozhatsky
On (05/03/16 14:03), Minchan Kim wrote:
> > will io_stat node work for you?
> 
> Firstly, I thought io_stat would be better. However, on second thought,
> I want to withdraw.
> 
> I think io_stat should go away.
> 
> failed_read
> failed_write
> invalid_io
> 
> I think Above things are really unneeded. If something is fail, upper
> class on top of zram, for example, FSes or Swap should emit the warning.
> So, I don't think we need to maintain it in zram layer.
> 
> notify_free
> 
> It's kins of discard command for the point of block device so I think
> general block should take care of it like read and write. If block will
> do it, remained thing about notify_free is only zram_slot_free_notify
> so I think we can move it from io_stat to mm_stat because it's related
> to memory, not block I/O.
> 
> With hoping with above things, I suggest let's not add anything to
> io_stat any more from now on and let's remove it sometime.
> Instead of it, let's add new dup stat.
> 
> What do you think about it?

I agree and it's true that we export a number of senseless and useless
stats (well, to most of the users). what's your plan regarding the
num_recompress file? is it guaranteed to go away once we convince ourselves
that per-cpu streams are fine (1 or 2 years later) or will it stay? if it's
100% 'temporal' then _probably_ we can instead add a per-device
/sys/.../zramID/debug_file  (or similar name), document it as
  "IT WILL NEVER BE DOCUMENTED, IT WILL NEVER BE STABLE, DON'T EVER USE IT"
and export there anything we want anytime we want. or is it too childish?

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-03 Thread Sergey Senozhatsky
On (05/03/16 14:03), Minchan Kim wrote:
> > will io_stat node work for you?
> 
> Firstly, I thought io_stat would be better. However, on second thought,
> I want to withdraw.
> 
> I think io_stat should go away.
> 
> failed_read
> failed_write
> invalid_io
> 
> I think Above things are really unneeded. If something is fail, upper
> class on top of zram, for example, FSes or Swap should emit the warning.
> So, I don't think we need to maintain it in zram layer.
> 
> notify_free
> 
> It's kins of discard command for the point of block device so I think
> general block should take care of it like read and write. If block will
> do it, remained thing about notify_free is only zram_slot_free_notify
> so I think we can move it from io_stat to mm_stat because it's related
> to memory, not block I/O.
> 
> With hoping with above things, I suggest let's not add anything to
> io_stat any more from now on and let's remove it sometime.
> Instead of it, let's add new dup stat.
> 
> What do you think about it?

I agree and it's true that we export a number of senseless and useless
stats (well, to most of the users). what's your plan regarding the
num_recompress file? is it guaranteed to go away once we convince ourselves
that per-cpu streams are fine (1 or 2 years later) or will it stay? if it's
100% 'temporal' then _probably_ we can instead add a per-device
/sys/.../zramID/debug_file  (or similar name), document it as
  "IT WILL NEVER BE DOCUMENTED, IT WILL NEVER BE STABLE, DON'T EVER USE IT"
and export there anything we want anytime we want. or is it too childish?

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-03 Thread Minchan Kim
On Tue, May 03, 2016 at 02:57:15PM +0900, Sergey Senozhatsky wrote:
> On (05/03/16 14:40), Minchan Kim wrote:
> [..]
> > > At least, we need sanity check code, still?
> > > Otherwise, user can echo "garbage" > /sys/xxx/max_comp_stream" and then
> > > cat /sys/xxx/max_comp_stream returns num_online_cpus.
> > 
> > One more thing,
> > 
> > User:
> > echo 4 > /sys/xxx/max_comp_stream"
> > cat /sys/xxx/max_comp_streams
> > 8
> 
> sure, it can also be
> 
> cat /sys/xxx/max_comp_streams
> 5
> cat /sys/xxx/max_comp_streams
> 6
> cat /sys/xxx/max_comp_streams
> 7
> cat /sys/xxx/max_comp_streams
> 3
> 
> depending on the availability of CPUs. but why would user space
> constantly check max_comp_streams?
> 
> > which is rather weird?
> > 
> > We should keep user's value and return it to user although it's techically
> > lying. IMO, it would be best way to prevent confusing for user until we
> > removes max_comp_streams finally.
> 
> well, I preferred to show the actual state of the device. besides,
> does anyone really do
> 
>   write buffer to file
>   if (success)
>   read from file and compare with the buffer
> 
> ?
> 

Okay, I want to go with your approach!
Could you update zram.txt to reflect it?

Thanks.


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-03 Thread Minchan Kim
On Tue, May 03, 2016 at 02:57:15PM +0900, Sergey Senozhatsky wrote:
> On (05/03/16 14:40), Minchan Kim wrote:
> [..]
> > > At least, we need sanity check code, still?
> > > Otherwise, user can echo "garbage" > /sys/xxx/max_comp_stream" and then
> > > cat /sys/xxx/max_comp_stream returns num_online_cpus.
> > 
> > One more thing,
> > 
> > User:
> > echo 4 > /sys/xxx/max_comp_stream"
> > cat /sys/xxx/max_comp_streams
> > 8
> 
> sure, it can also be
> 
> cat /sys/xxx/max_comp_streams
> 5
> cat /sys/xxx/max_comp_streams
> 6
> cat /sys/xxx/max_comp_streams
> 7
> cat /sys/xxx/max_comp_streams
> 3
> 
> depending on the availability of CPUs. but why would user space
> constantly check max_comp_streams?
> 
> > which is rather weird?
> > 
> > We should keep user's value and return it to user although it's techically
> > lying. IMO, it would be best way to prevent confusing for user until we
> > removes max_comp_streams finally.
> 
> well, I preferred to show the actual state of the device. besides,
> does anyone really do
> 
>   write buffer to file
>   if (success)
>   read from file and compare with the buffer
> 
> ?
> 

Okay, I want to go with your approach!
Could you update zram.txt to reflect it?

Thanks.


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
On (05/03/16 14:40), Minchan Kim wrote:
[..]
> > At least, we need sanity check code, still?
> > Otherwise, user can echo "garbage" > /sys/xxx/max_comp_stream" and then
> > cat /sys/xxx/max_comp_stream returns num_online_cpus.
> 
> One more thing,
> 
> User:
> echo 4 > /sys/xxx/max_comp_stream"
> cat /sys/xxx/max_comp_streams
> 8

sure, it can also be

cat /sys/xxx/max_comp_streams
5
cat /sys/xxx/max_comp_streams
6
cat /sys/xxx/max_comp_streams
7
cat /sys/xxx/max_comp_streams
3

depending on the availability of CPUs. but why would user space
constantly check max_comp_streams?

> which is rather weird?
> 
> We should keep user's value and return it to user although it's techically
> lying. IMO, it would be best way to prevent confusing for user until we
> removes max_comp_streams finally.

well, I preferred to show the actual state of the device. besides,
does anyone really do

write buffer to file
if (success)
read from file and compare with the buffer

?

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
On (05/03/16 14:40), Minchan Kim wrote:
[..]
> > At least, we need sanity check code, still?
> > Otherwise, user can echo "garbage" > /sys/xxx/max_comp_stream" and then
> > cat /sys/xxx/max_comp_stream returns num_online_cpus.
> 
> One more thing,
> 
> User:
> echo 4 > /sys/xxx/max_comp_stream"
> cat /sys/xxx/max_comp_streams
> 8

sure, it can also be

cat /sys/xxx/max_comp_streams
5
cat /sys/xxx/max_comp_streams
6
cat /sys/xxx/max_comp_streams
7
cat /sys/xxx/max_comp_streams
3

depending on the availability of CPUs. but why would user space
constantly check max_comp_streams?

> which is rather weird?
> 
> We should keep user's value and return it to user although it's techically
> lying. IMO, it would be best way to prevent confusing for user until we
> removes max_comp_streams finally.

well, I preferred to show the actual state of the device. besides,
does anyone really do

write buffer to file
if (success)
read from file and compare with the buffer

?

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
On (05/03/16 14:23), Minchan Kim wrote:
[..]
> > -   zram->max_comp_streams = num;
> > -   ret = len;
> > -out:
> > -   up_write(>init_lock);
> > -   return ret;
> 
> At least, we need sanity check code, still?
> Otherwise, user can echo "garbage" > /sys/xxx/max_comp_stream" and then
> cat /sys/xxx/max_comp_stream returns num_online_cpus.

hm, I couldn't find any reason to keep the check. we completely
ignore the value anyway, cat /sys/xxx/max_comp_stream will always
return num_online_cpus(), regardless the correctness of supplied
data; `garbage', `2', `1024', `32' make no difference.

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
On (05/03/16 14:23), Minchan Kim wrote:
[..]
> > -   zram->max_comp_streams = num;
> > -   ret = len;
> > -out:
> > -   up_write(>init_lock);
> > -   return ret;
> 
> At least, we need sanity check code, still?
> Otherwise, user can echo "garbage" > /sys/xxx/max_comp_stream" and then
> cat /sys/xxx/max_comp_stream returns num_online_cpus.

hm, I couldn't find any reason to keep the check. we completely
ignore the value anyway, cat /sys/xxx/max_comp_stream will always
return num_online_cpus(), regardless the correctness of supplied
data; `garbage', `2', `1024', `32' make no difference.

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
On Tue, May 03, 2016 at 02:23:24PM +0900, Minchan Kim wrote:
> On Mon, May 02, 2016 at 05:06:00PM +0900, Sergey Senozhatsky wrote:
> > On (05/02/16 16:25), Sergey Senozhatsky wrote:
> > [..]
> > > > Trivial:
> > > > We could remove max_strm now and change description.
> > > 
> > > oh, yes.
> > 
> > how about something like this? remove max_comp_streams entirely, but
> > leave the attr. if we keep zram->max_comp_streams and return its value
> > (set by user space) from show() handler, we are techically lying;
> > because the actual number of streams is now num_online_cpus().
> 
> Yes, we should have limit the value to num_online_cpus from the
> beginning.
> 
> > 
> > 
> > ===8<===8<===
> > 
> > From: Sergey Senozhatsky 
> > Subject: [PATCH] zram: remove max_comp_streams internals
> > 
> > Remove the internal part of max_comp_streams interface, since we
> > switched to per-cpu streams. We will keep RW max_comp_streams attr
> > around, because:
> > 
> > a) we may (silently) switch back to idle compression streams list
> >and don't want to disturb user space
> > b) max_comp_streams attr must wait for the next 'lay off cycle';
> >we give user space 2 years to adjust before we remove/downgrade
> >the attr, and there are already several attrs scheduled for
> >removal in 4.11, so it's too late for max_comp_streams.
> > 
> > Signed-off-by: Sergey Senozhatsky 
> > ---
> >  drivers/block/zram/zcomp.c|  7 +--
> >  drivers/block/zram/zcomp.h|  2 +-
> >  drivers/block/zram/zram_drv.c | 47 
> > +++
> >  drivers/block/zram/zram_drv.h |  1 -
> >  4 files changed, 14 insertions(+), 43 deletions(-)
> > 
> > diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> > index d4159e4..d4de9cb 100644
> > --- a/drivers/block/zram/zcomp.c
> > +++ b/drivers/block/zram/zcomp.c
> > @@ -95,11 +95,6 @@ bool zcomp_available_algorithm(const char *comp)
> > return find_backend(comp) != NULL;
> >  }
> >  
> > -bool zcomp_set_max_streams(struct zcomp *comp, int num_strm)
> > -{
> > -   return true;
> > -}
> > -
> >  struct zcomp_strm *zcomp_strm_find(struct zcomp *comp)
> >  {
> > return *get_cpu_ptr(comp->stream);
> > @@ -211,7 +206,7 @@ void zcomp_destroy(struct zcomp *comp)
> >   * case of allocation error, or any other error potentially
> >   * returned by functions zcomp_strm_{multi,single}_create.
> >   */
> > -struct zcomp *zcomp_create(const char *compress, int max_strm)
> > +struct zcomp *zcomp_create(const char *compress)
> >  {
> > struct zcomp *comp;
> > struct zcomp_backend *backend;
> > diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
> > index aba8c21..ffd88cb 100644
> > --- a/drivers/block/zram/zcomp.h
> > +++ b/drivers/block/zram/zcomp.h
> > @@ -45,7 +45,7 @@ struct zcomp {
> >  ssize_t zcomp_available_show(const char *comp, char *buf);
> >  bool zcomp_available_algorithm(const char *comp);
> >  
> > -struct zcomp *zcomp_create(const char *comp, int max_strm);
> > +struct zcomp *zcomp_create(const char *comp);
> >  void zcomp_destroy(struct zcomp *comp);
> >  
> >  struct zcomp_strm *zcomp_strm_find(struct zcomp *comp);
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index cad1751..817e511 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -304,46 +304,25 @@ static ssize_t mem_used_max_store(struct device *dev,
> > return len;
> >  }
> >  
> > +/*
> > + * We switched to per-cpu streams and this attr is not needed anymore.
> > + * However, we will keep it around for some time, because:
> > + * a) we may revert per-cpu streams in the future
> > + * b) it's visible to user space and we need to follow our 2 years
> > + *retirement rule; but we already have a number of 'soon to be
> > + *altered' attrs, so max_comp_streams need to wait for the next
> > + *layoff cycle.
> > + */
> 
> Thanks for nice comment.
> 
> >  static ssize_t max_comp_streams_show(struct device *dev,
> > struct device_attribute *attr, char *buf)
> >  {
> > -   int val;
> > -   struct zram *zram = dev_to_zram(dev);
> > -
> > -   down_read(>init_lock);
> > -   val = zram->max_comp_streams;
> > -   up_read(>init_lock);
> > -
> > -   return scnprintf(buf, PAGE_SIZE, "%d\n", val);
> > +   return scnprintf(buf, PAGE_SIZE, "%d\n", num_online_cpus());
> >  }
> >  
> >  static ssize_t max_comp_streams_store(struct device *dev,
> > struct device_attribute *attr, const char *buf, size_t len)
> >  {
> > -   int num;
> > -   struct zram *zram = dev_to_zram(dev);
> > -   int ret;
> > -
> > -   ret = kstrtoint(buf, 0, );
> > -   if (ret < 0)
> > -   return ret;
> > -   if (num < 1)
> > -   return -EINVAL;
> > -
> > -   down_write(>init_lock);
> > -   if (init_done(zram)) {
> > -   if (!zcomp_set_max_streams(zram->comp, num)) {
> > -   

Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
On Tue, May 03, 2016 at 02:23:24PM +0900, Minchan Kim wrote:
> On Mon, May 02, 2016 at 05:06:00PM +0900, Sergey Senozhatsky wrote:
> > On (05/02/16 16:25), Sergey Senozhatsky wrote:
> > [..]
> > > > Trivial:
> > > > We could remove max_strm now and change description.
> > > 
> > > oh, yes.
> > 
> > how about something like this? remove max_comp_streams entirely, but
> > leave the attr. if we keep zram->max_comp_streams and return its value
> > (set by user space) from show() handler, we are techically lying;
> > because the actual number of streams is now num_online_cpus().
> 
> Yes, we should have limit the value to num_online_cpus from the
> beginning.
> 
> > 
> > 
> > ===8<===8<===
> > 
> > From: Sergey Senozhatsky 
> > Subject: [PATCH] zram: remove max_comp_streams internals
> > 
> > Remove the internal part of max_comp_streams interface, since we
> > switched to per-cpu streams. We will keep RW max_comp_streams attr
> > around, because:
> > 
> > a) we may (silently) switch back to idle compression streams list
> >and don't want to disturb user space
> > b) max_comp_streams attr must wait for the next 'lay off cycle';
> >we give user space 2 years to adjust before we remove/downgrade
> >the attr, and there are already several attrs scheduled for
> >removal in 4.11, so it's too late for max_comp_streams.
> > 
> > Signed-off-by: Sergey Senozhatsky 
> > ---
> >  drivers/block/zram/zcomp.c|  7 +--
> >  drivers/block/zram/zcomp.h|  2 +-
> >  drivers/block/zram/zram_drv.c | 47 
> > +++
> >  drivers/block/zram/zram_drv.h |  1 -
> >  4 files changed, 14 insertions(+), 43 deletions(-)
> > 
> > diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> > index d4159e4..d4de9cb 100644
> > --- a/drivers/block/zram/zcomp.c
> > +++ b/drivers/block/zram/zcomp.c
> > @@ -95,11 +95,6 @@ bool zcomp_available_algorithm(const char *comp)
> > return find_backend(comp) != NULL;
> >  }
> >  
> > -bool zcomp_set_max_streams(struct zcomp *comp, int num_strm)
> > -{
> > -   return true;
> > -}
> > -
> >  struct zcomp_strm *zcomp_strm_find(struct zcomp *comp)
> >  {
> > return *get_cpu_ptr(comp->stream);
> > @@ -211,7 +206,7 @@ void zcomp_destroy(struct zcomp *comp)
> >   * case of allocation error, or any other error potentially
> >   * returned by functions zcomp_strm_{multi,single}_create.
> >   */
> > -struct zcomp *zcomp_create(const char *compress, int max_strm)
> > +struct zcomp *zcomp_create(const char *compress)
> >  {
> > struct zcomp *comp;
> > struct zcomp_backend *backend;
> > diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
> > index aba8c21..ffd88cb 100644
> > --- a/drivers/block/zram/zcomp.h
> > +++ b/drivers/block/zram/zcomp.h
> > @@ -45,7 +45,7 @@ struct zcomp {
> >  ssize_t zcomp_available_show(const char *comp, char *buf);
> >  bool zcomp_available_algorithm(const char *comp);
> >  
> > -struct zcomp *zcomp_create(const char *comp, int max_strm);
> > +struct zcomp *zcomp_create(const char *comp);
> >  void zcomp_destroy(struct zcomp *comp);
> >  
> >  struct zcomp_strm *zcomp_strm_find(struct zcomp *comp);
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index cad1751..817e511 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -304,46 +304,25 @@ static ssize_t mem_used_max_store(struct device *dev,
> > return len;
> >  }
> >  
> > +/*
> > + * We switched to per-cpu streams and this attr is not needed anymore.
> > + * However, we will keep it around for some time, because:
> > + * a) we may revert per-cpu streams in the future
> > + * b) it's visible to user space and we need to follow our 2 years
> > + *retirement rule; but we already have a number of 'soon to be
> > + *altered' attrs, so max_comp_streams need to wait for the next
> > + *layoff cycle.
> > + */
> 
> Thanks for nice comment.
> 
> >  static ssize_t max_comp_streams_show(struct device *dev,
> > struct device_attribute *attr, char *buf)
> >  {
> > -   int val;
> > -   struct zram *zram = dev_to_zram(dev);
> > -
> > -   down_read(>init_lock);
> > -   val = zram->max_comp_streams;
> > -   up_read(>init_lock);
> > -
> > -   return scnprintf(buf, PAGE_SIZE, "%d\n", val);
> > +   return scnprintf(buf, PAGE_SIZE, "%d\n", num_online_cpus());
> >  }
> >  
> >  static ssize_t max_comp_streams_store(struct device *dev,
> > struct device_attribute *attr, const char *buf, size_t len)
> >  {
> > -   int num;
> > -   struct zram *zram = dev_to_zram(dev);
> > -   int ret;
> > -
> > -   ret = kstrtoint(buf, 0, );
> > -   if (ret < 0)
> > -   return ret;
> > -   if (num < 1)
> > -   return -EINVAL;
> > -
> > -   down_write(>init_lock);
> > -   if (init_done(zram)) {
> > -   if (!zcomp_set_max_streams(zram->comp, num)) {
> > -   pr_info("Cannot change max compression streams\n");
> > - 

Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
On Mon, May 02, 2016 at 05:06:00PM +0900, Sergey Senozhatsky wrote:
> On (05/02/16 16:25), Sergey Senozhatsky wrote:
> [..]
> > > Trivial:
> > > We could remove max_strm now and change description.
> > 
> > oh, yes.
> 
> how about something like this? remove max_comp_streams entirely, but
> leave the attr. if we keep zram->max_comp_streams and return its value
> (set by user space) from show() handler, we are techically lying;
> because the actual number of streams is now num_online_cpus().

Yes, we should have limit the value to num_online_cpus from the
beginning.

> 
> 
> ===8<===8<===
> 
> From: Sergey Senozhatsky 
> Subject: [PATCH] zram: remove max_comp_streams internals
> 
> Remove the internal part of max_comp_streams interface, since we
> switched to per-cpu streams. We will keep RW max_comp_streams attr
> around, because:
> 
> a) we may (silently) switch back to idle compression streams list
>and don't want to disturb user space
> b) max_comp_streams attr must wait for the next 'lay off cycle';
>we give user space 2 years to adjust before we remove/downgrade
>the attr, and there are already several attrs scheduled for
>removal in 4.11, so it's too late for max_comp_streams.
> 
> Signed-off-by: Sergey Senozhatsky 
> ---
>  drivers/block/zram/zcomp.c|  7 +--
>  drivers/block/zram/zcomp.h|  2 +-
>  drivers/block/zram/zram_drv.c | 47 
> +++
>  drivers/block/zram/zram_drv.h |  1 -
>  4 files changed, 14 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index d4159e4..d4de9cb 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -95,11 +95,6 @@ bool zcomp_available_algorithm(const char *comp)
>   return find_backend(comp) != NULL;
>  }
>  
> -bool zcomp_set_max_streams(struct zcomp *comp, int num_strm)
> -{
> - return true;
> -}
> -
>  struct zcomp_strm *zcomp_strm_find(struct zcomp *comp)
>  {
>   return *get_cpu_ptr(comp->stream);
> @@ -211,7 +206,7 @@ void zcomp_destroy(struct zcomp *comp)
>   * case of allocation error, or any other error potentially
>   * returned by functions zcomp_strm_{multi,single}_create.
>   */
> -struct zcomp *zcomp_create(const char *compress, int max_strm)
> +struct zcomp *zcomp_create(const char *compress)
>  {
>   struct zcomp *comp;
>   struct zcomp_backend *backend;
> diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
> index aba8c21..ffd88cb 100644
> --- a/drivers/block/zram/zcomp.h
> +++ b/drivers/block/zram/zcomp.h
> @@ -45,7 +45,7 @@ struct zcomp {
>  ssize_t zcomp_available_show(const char *comp, char *buf);
>  bool zcomp_available_algorithm(const char *comp);
>  
> -struct zcomp *zcomp_create(const char *comp, int max_strm);
> +struct zcomp *zcomp_create(const char *comp);
>  void zcomp_destroy(struct zcomp *comp);
>  
>  struct zcomp_strm *zcomp_strm_find(struct zcomp *comp);
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index cad1751..817e511 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -304,46 +304,25 @@ static ssize_t mem_used_max_store(struct device *dev,
>   return len;
>  }
>  
> +/*
> + * We switched to per-cpu streams and this attr is not needed anymore.
> + * However, we will keep it around for some time, because:
> + * a) we may revert per-cpu streams in the future
> + * b) it's visible to user space and we need to follow our 2 years
> + *retirement rule; but we already have a number of 'soon to be
> + *altered' attrs, so max_comp_streams need to wait for the next
> + *layoff cycle.
> + */

Thanks for nice comment.

>  static ssize_t max_comp_streams_show(struct device *dev,
>   struct device_attribute *attr, char *buf)
>  {
> - int val;
> - struct zram *zram = dev_to_zram(dev);
> -
> - down_read(>init_lock);
> - val = zram->max_comp_streams;
> - up_read(>init_lock);
> -
> - return scnprintf(buf, PAGE_SIZE, "%d\n", val);
> + return scnprintf(buf, PAGE_SIZE, "%d\n", num_online_cpus());
>  }
>  
>  static ssize_t max_comp_streams_store(struct device *dev,
>   struct device_attribute *attr, const char *buf, size_t len)
>  {
> - int num;
> - struct zram *zram = dev_to_zram(dev);
> - int ret;
> -
> - ret = kstrtoint(buf, 0, );
> - if (ret < 0)
> - return ret;
> - if (num < 1)
> - return -EINVAL;
> -
> - down_write(>init_lock);
> - if (init_done(zram)) {
> - if (!zcomp_set_max_streams(zram->comp, num)) {
> - pr_info("Cannot change max compression streams\n");
> - ret = -EINVAL;
> - goto out;
> - }
> - }
> -
> - zram->max_comp_streams = num;
> - ret = len;
> -out:
> - up_write(>init_lock);
> - return ret;

At 

Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
On Mon, May 02, 2016 at 05:06:00PM +0900, Sergey Senozhatsky wrote:
> On (05/02/16 16:25), Sergey Senozhatsky wrote:
> [..]
> > > Trivial:
> > > We could remove max_strm now and change description.
> > 
> > oh, yes.
> 
> how about something like this? remove max_comp_streams entirely, but
> leave the attr. if we keep zram->max_comp_streams and return its value
> (set by user space) from show() handler, we are techically lying;
> because the actual number of streams is now num_online_cpus().

Yes, we should have limit the value to num_online_cpus from the
beginning.

> 
> 
> ===8<===8<===
> 
> From: Sergey Senozhatsky 
> Subject: [PATCH] zram: remove max_comp_streams internals
> 
> Remove the internal part of max_comp_streams interface, since we
> switched to per-cpu streams. We will keep RW max_comp_streams attr
> around, because:
> 
> a) we may (silently) switch back to idle compression streams list
>and don't want to disturb user space
> b) max_comp_streams attr must wait for the next 'lay off cycle';
>we give user space 2 years to adjust before we remove/downgrade
>the attr, and there are already several attrs scheduled for
>removal in 4.11, so it's too late for max_comp_streams.
> 
> Signed-off-by: Sergey Senozhatsky 
> ---
>  drivers/block/zram/zcomp.c|  7 +--
>  drivers/block/zram/zcomp.h|  2 +-
>  drivers/block/zram/zram_drv.c | 47 
> +++
>  drivers/block/zram/zram_drv.h |  1 -
>  4 files changed, 14 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index d4159e4..d4de9cb 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -95,11 +95,6 @@ bool zcomp_available_algorithm(const char *comp)
>   return find_backend(comp) != NULL;
>  }
>  
> -bool zcomp_set_max_streams(struct zcomp *comp, int num_strm)
> -{
> - return true;
> -}
> -
>  struct zcomp_strm *zcomp_strm_find(struct zcomp *comp)
>  {
>   return *get_cpu_ptr(comp->stream);
> @@ -211,7 +206,7 @@ void zcomp_destroy(struct zcomp *comp)
>   * case of allocation error, or any other error potentially
>   * returned by functions zcomp_strm_{multi,single}_create.
>   */
> -struct zcomp *zcomp_create(const char *compress, int max_strm)
> +struct zcomp *zcomp_create(const char *compress)
>  {
>   struct zcomp *comp;
>   struct zcomp_backend *backend;
> diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
> index aba8c21..ffd88cb 100644
> --- a/drivers/block/zram/zcomp.h
> +++ b/drivers/block/zram/zcomp.h
> @@ -45,7 +45,7 @@ struct zcomp {
>  ssize_t zcomp_available_show(const char *comp, char *buf);
>  bool zcomp_available_algorithm(const char *comp);
>  
> -struct zcomp *zcomp_create(const char *comp, int max_strm);
> +struct zcomp *zcomp_create(const char *comp);
>  void zcomp_destroy(struct zcomp *comp);
>  
>  struct zcomp_strm *zcomp_strm_find(struct zcomp *comp);
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index cad1751..817e511 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -304,46 +304,25 @@ static ssize_t mem_used_max_store(struct device *dev,
>   return len;
>  }
>  
> +/*
> + * We switched to per-cpu streams and this attr is not needed anymore.
> + * However, we will keep it around for some time, because:
> + * a) we may revert per-cpu streams in the future
> + * b) it's visible to user space and we need to follow our 2 years
> + *retirement rule; but we already have a number of 'soon to be
> + *altered' attrs, so max_comp_streams need to wait for the next
> + *layoff cycle.
> + */

Thanks for nice comment.

>  static ssize_t max_comp_streams_show(struct device *dev,
>   struct device_attribute *attr, char *buf)
>  {
> - int val;
> - struct zram *zram = dev_to_zram(dev);
> -
> - down_read(>init_lock);
> - val = zram->max_comp_streams;
> - up_read(>init_lock);
> -
> - return scnprintf(buf, PAGE_SIZE, "%d\n", val);
> + return scnprintf(buf, PAGE_SIZE, "%d\n", num_online_cpus());
>  }
>  
>  static ssize_t max_comp_streams_store(struct device *dev,
>   struct device_attribute *attr, const char *buf, size_t len)
>  {
> - int num;
> - struct zram *zram = dev_to_zram(dev);
> - int ret;
> -
> - ret = kstrtoint(buf, 0, );
> - if (ret < 0)
> - return ret;
> - if (num < 1)
> - return -EINVAL;
> -
> - down_write(>init_lock);
> - if (init_done(zram)) {
> - if (!zcomp_set_max_streams(zram->comp, num)) {
> - pr_info("Cannot change max compression streams\n");
> - ret = -EINVAL;
> - goto out;
> - }
> - }
> -
> - zram->max_comp_streams = num;
> - ret = len;
> -out:
> - up_write(>init_lock);
> - return ret;

At least, we need sanity check code, still?
Otherwise, user can 

Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
On Tue, May 03, 2016 at 01:29:02PM +0900, Sergey Senozhatsky wrote:
> On (05/03/16 11:30), Sergey Senozhatsky wrote:
> > > We are concerning about returing back to no per-cpu options but actually,
> > > I don't want. If duplicate compression is really problem(But It's really
> > > unlikely), we should try to solve the problem itself with different way
> > > rather than roll-back to old, first of all.
> > > 
> > > I hope we can. So let's not add big worry about adding new dup stat. :)
> > 
> > ok, no prob. do you want it a separate sysfs node or a column in mm_stat?
> > I'd prefer mm_stat column, or somewhere in those cumulative files; not a
> > dedicated node: we decided to get rid of them some time ago.
> > 
> 
> will io_stat node work for you?

Firstly, I thought io_stat would be better. However, on second thought,
I want to withdraw.

I think io_stat should go away.

failed_read
failed_write
invalid_io

I think Above things are really unneeded. If something is fail, upper
class on top of zram, for example, FSes or Swap should emit the warning.
So, I don't think we need to maintain it in zram layer.

notify_free

It's kins of discard command for the point of block device so I think
general block should take care of it like read and write. If block will
do it, remained thing about notify_free is only zram_slot_free_notify
so I think we can move it from io_stat to mm_stat because it's related
to memory, not block I/O.

With hoping with above things, I suggest let's not add anything to
io_stat any more from now on and let's remove it sometime.
Instead of it, let's add new dup stat.

What do you think about it?


> I'll submit a formal patch later today. when you have time, can you
> take a look at http://marc.info/?l=linux-kernel=146217628030970 ?

Oops, Sorry I missed. I will take a look.

> I think we can fold this one into 0002. it will make 0002 slightly
> bigger, but there nothing complicated in there, just cleanup.
> 
> 
> 
> From: Sergey Senozhatsky 
> Subject: [PATCH] zram: export the number of re-compressions
> 
> Make the number of re-compressions visible via the io_stat node,
> so we will be able to track down any issues caused by per-cpu
> compression streams.
> 
> Signed-off-by: Sergey Senozhatsky 
> Suggested-by: Minchan Kim 
> ---
>  Documentation/blockdev/zram.txt | 3 +++
>  drivers/block/zram/zram_drv.c   | 7 +--
>  drivers/block/zram/zram_drv.h   | 1 +
>  3 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> index 5bda503..386d260 100644
> --- a/Documentation/blockdev/zram.txt
> +++ b/Documentation/blockdev/zram.txt
> @@ -183,6 +183,8 @@ mem_limit RWthe maximum amount of memory ZRAM 
> can use to store
>  pages_compacted   ROthe number of pages freed during compaction
>  (available only via zram/mm_stat node)
>  compact   WOtrigger memory compaction
> +num_recompressROthe number of times fast compression paths failed
> +and zram performed re-compression via a slow path
>  
>  WARNING
>  ===
> @@ -215,6 +217,7 @@ whitespace:
>   failed_writes
>   invalid_io
>   notify_free
> + num_recompress
>  
>  File /sys/block/zram/mm_stat
>  
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 817e511..11b19c9 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -395,7 +395,8 @@ static ssize_t io_stat_show(struct device *dev,
>   (u64)atomic64_read(>stats.failed_reads),
>   (u64)atomic64_read(>stats.failed_writes),
>   (u64)atomic64_read(>stats.invalid_io),
> - (u64)atomic64_read(>stats.notify_free));
> + (u64)atomic64_read(>stats.notify_free),
> + (u64)atomic64_read(>stats.num_recompress));
>   up_read(>init_lock);
>  
>   return ret;
> @@ -721,8 +722,10 @@ compress_again:
>  
>   handle = zs_malloc(meta->mem_pool, clen,
>   GFP_NOIO | __GFP_HIGHMEM);
> - if (handle)
> + if (handle) {
> + atomic64_inc(>stats.num_recompress);
>   goto compress_again;
> + }
>  
>   pr_err("Error allocating memory for compressed page: %u, 
> size=%zu\n",
>   index, clen);
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index 06b1636..78d7e8f 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -85,6 +85,7 @@ struct zram_stats {
>   atomic64_t zero_pages;  /* no. of zero filled pages */
>   atomic64_t pages_stored;/* no. of pages currently stored */
>   atomic_long_t max_used_pages;   /* no. 

Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
On Tue, May 03, 2016 at 01:29:02PM +0900, Sergey Senozhatsky wrote:
> On (05/03/16 11:30), Sergey Senozhatsky wrote:
> > > We are concerning about returing back to no per-cpu options but actually,
> > > I don't want. If duplicate compression is really problem(But It's really
> > > unlikely), we should try to solve the problem itself with different way
> > > rather than roll-back to old, first of all.
> > > 
> > > I hope we can. So let's not add big worry about adding new dup stat. :)
> > 
> > ok, no prob. do you want it a separate sysfs node or a column in mm_stat?
> > I'd prefer mm_stat column, or somewhere in those cumulative files; not a
> > dedicated node: we decided to get rid of them some time ago.
> > 
> 
> will io_stat node work for you?

Firstly, I thought io_stat would be better. However, on second thought,
I want to withdraw.

I think io_stat should go away.

failed_read
failed_write
invalid_io

I think Above things are really unneeded. If something is fail, upper
class on top of zram, for example, FSes or Swap should emit the warning.
So, I don't think we need to maintain it in zram layer.

notify_free

It's kins of discard command for the point of block device so I think
general block should take care of it like read and write. If block will
do it, remained thing about notify_free is only zram_slot_free_notify
so I think we can move it from io_stat to mm_stat because it's related
to memory, not block I/O.

With hoping with above things, I suggest let's not add anything to
io_stat any more from now on and let's remove it sometime.
Instead of it, let's add new dup stat.

What do you think about it?


> I'll submit a formal patch later today. when you have time, can you
> take a look at http://marc.info/?l=linux-kernel=146217628030970 ?

Oops, Sorry I missed. I will take a look.

> I think we can fold this one into 0002. it will make 0002 slightly
> bigger, but there nothing complicated in there, just cleanup.
> 
> 
> 
> From: Sergey Senozhatsky 
> Subject: [PATCH] zram: export the number of re-compressions
> 
> Make the number of re-compressions visible via the io_stat node,
> so we will be able to track down any issues caused by per-cpu
> compression streams.
> 
> Signed-off-by: Sergey Senozhatsky 
> Suggested-by: Minchan Kim 
> ---
>  Documentation/blockdev/zram.txt | 3 +++
>  drivers/block/zram/zram_drv.c   | 7 +--
>  drivers/block/zram/zram_drv.h   | 1 +
>  3 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> index 5bda503..386d260 100644
> --- a/Documentation/blockdev/zram.txt
> +++ b/Documentation/blockdev/zram.txt
> @@ -183,6 +183,8 @@ mem_limit RWthe maximum amount of memory ZRAM 
> can use to store
>  pages_compacted   ROthe number of pages freed during compaction
>  (available only via zram/mm_stat node)
>  compact   WOtrigger memory compaction
> +num_recompressROthe number of times fast compression paths failed
> +and zram performed re-compression via a slow path
>  
>  WARNING
>  ===
> @@ -215,6 +217,7 @@ whitespace:
>   failed_writes
>   invalid_io
>   notify_free
> + num_recompress
>  
>  File /sys/block/zram/mm_stat
>  
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 817e511..11b19c9 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -395,7 +395,8 @@ static ssize_t io_stat_show(struct device *dev,
>   (u64)atomic64_read(>stats.failed_reads),
>   (u64)atomic64_read(>stats.failed_writes),
>   (u64)atomic64_read(>stats.invalid_io),
> - (u64)atomic64_read(>stats.notify_free));
> + (u64)atomic64_read(>stats.notify_free),
> + (u64)atomic64_read(>stats.num_recompress));
>   up_read(>init_lock);
>  
>   return ret;
> @@ -721,8 +722,10 @@ compress_again:
>  
>   handle = zs_malloc(meta->mem_pool, clen,
>   GFP_NOIO | __GFP_HIGHMEM);
> - if (handle)
> + if (handle) {
> + atomic64_inc(>stats.num_recompress);
>   goto compress_again;
> + }
>  
>   pr_err("Error allocating memory for compressed page: %u, 
> size=%zu\n",
>   index, clen);
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index 06b1636..78d7e8f 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -85,6 +85,7 @@ struct zram_stats {
>   atomic64_t zero_pages;  /* no. of zero filled pages */
>   atomic64_t pages_stored;/* no. of pages currently stored */
>   atomic_long_t max_used_pages;   /* no. of maximum pages stored */
> + atomic64_t num_recompress; /* no. of failed 

Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
On (05/03/16 11:30), Sergey Senozhatsky wrote:
> > We are concerning about returing back to no per-cpu options but actually,
> > I don't want. If duplicate compression is really problem(But It's really
> > unlikely), we should try to solve the problem itself with different way
> > rather than roll-back to old, first of all.
> > 
> > I hope we can. So let's not add big worry about adding new dup stat. :)
> 
> ok, no prob. do you want it a separate sysfs node or a column in mm_stat?
> I'd prefer mm_stat column, or somewhere in those cumulative files; not a
> dedicated node: we decided to get rid of them some time ago.
> 

will io_stat node work for you?
I'll submit a formal patch later today. when you have time, can you
take a look at http://marc.info/?l=linux-kernel=146217628030970 ?
I think we can fold this one into 0002. it will make 0002 slightly
bigger, but there nothing complicated in there, just cleanup.



From: Sergey Senozhatsky 
Subject: [PATCH] zram: export the number of re-compressions

Make the number of re-compressions visible via the io_stat node,
so we will be able to track down any issues caused by per-cpu
compression streams.

Signed-off-by: Sergey Senozhatsky 
Suggested-by: Minchan Kim 
---
 Documentation/blockdev/zram.txt | 3 +++
 drivers/block/zram/zram_drv.c   | 7 +--
 drivers/block/zram/zram_drv.h   | 1 +
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 5bda503..386d260 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -183,6 +183,8 @@ mem_limit RWthe maximum amount of memory ZRAM 
can use to store
 pages_compacted   ROthe number of pages freed during compaction
 (available only via zram/mm_stat node)
 compact   WOtrigger memory compaction
+num_recompressROthe number of times fast compression paths failed
+and zram performed re-compression via a slow path
 
 WARNING
 ===
@@ -215,6 +217,7 @@ whitespace:
failed_writes
invalid_io
notify_free
+   num_recompress
 
 File /sys/block/zram/mm_stat
 
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 817e511..11b19c9 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -395,7 +395,8 @@ static ssize_t io_stat_show(struct device *dev,
(u64)atomic64_read(>stats.failed_reads),
(u64)atomic64_read(>stats.failed_writes),
(u64)atomic64_read(>stats.invalid_io),
-   (u64)atomic64_read(>stats.notify_free));
+   (u64)atomic64_read(>stats.notify_free),
+   (u64)atomic64_read(>stats.num_recompress));
up_read(>init_lock);
 
return ret;
@@ -721,8 +722,10 @@ compress_again:
 
handle = zs_malloc(meta->mem_pool, clen,
GFP_NOIO | __GFP_HIGHMEM);
-   if (handle)
+   if (handle) {
+   atomic64_inc(>stats.num_recompress);
goto compress_again;
+   }
 
pr_err("Error allocating memory for compressed page: %u, 
size=%zu\n",
index, clen);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 06b1636..78d7e8f 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -85,6 +85,7 @@ struct zram_stats {
atomic64_t zero_pages;  /* no. of zero filled pages */
atomic64_t pages_stored;/* no. of pages currently stored */
atomic_long_t max_used_pages;   /* no. of maximum pages stored */
+   atomic64_t num_recompress; /* no. of failed compression fast paths */
 };
 
 struct zram_meta {
-- 
2.8.2



Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
On (05/03/16 11:30), Sergey Senozhatsky wrote:
> > We are concerning about returing back to no per-cpu options but actually,
> > I don't want. If duplicate compression is really problem(But It's really
> > unlikely), we should try to solve the problem itself with different way
> > rather than roll-back to old, first of all.
> > 
> > I hope we can. So let's not add big worry about adding new dup stat. :)
> 
> ok, no prob. do you want it a separate sysfs node or a column in mm_stat?
> I'd prefer mm_stat column, or somewhere in those cumulative files; not a
> dedicated node: we decided to get rid of them some time ago.
> 

will io_stat node work for you?
I'll submit a formal patch later today. when you have time, can you
take a look at http://marc.info/?l=linux-kernel=146217628030970 ?
I think we can fold this one into 0002. it will make 0002 slightly
bigger, but there nothing complicated in there, just cleanup.



From: Sergey Senozhatsky 
Subject: [PATCH] zram: export the number of re-compressions

Make the number of re-compressions visible via the io_stat node,
so we will be able to track down any issues caused by per-cpu
compression streams.

Signed-off-by: Sergey Senozhatsky 
Suggested-by: Minchan Kim 
---
 Documentation/blockdev/zram.txt | 3 +++
 drivers/block/zram/zram_drv.c   | 7 +--
 drivers/block/zram/zram_drv.h   | 1 +
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 5bda503..386d260 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -183,6 +183,8 @@ mem_limit RWthe maximum amount of memory ZRAM 
can use to store
 pages_compacted   ROthe number of pages freed during compaction
 (available only via zram/mm_stat node)
 compact   WOtrigger memory compaction
+num_recompressROthe number of times fast compression paths failed
+and zram performed re-compression via a slow path
 
 WARNING
 ===
@@ -215,6 +217,7 @@ whitespace:
failed_writes
invalid_io
notify_free
+   num_recompress
 
 File /sys/block/zram/mm_stat
 
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 817e511..11b19c9 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -395,7 +395,8 @@ static ssize_t io_stat_show(struct device *dev,
(u64)atomic64_read(>stats.failed_reads),
(u64)atomic64_read(>stats.failed_writes),
(u64)atomic64_read(>stats.invalid_io),
-   (u64)atomic64_read(>stats.notify_free));
+   (u64)atomic64_read(>stats.notify_free),
+   (u64)atomic64_read(>stats.num_recompress));
up_read(>init_lock);
 
return ret;
@@ -721,8 +722,10 @@ compress_again:
 
handle = zs_malloc(meta->mem_pool, clen,
GFP_NOIO | __GFP_HIGHMEM);
-   if (handle)
+   if (handle) {
+   atomic64_inc(>stats.num_recompress);
goto compress_again;
+   }
 
pr_err("Error allocating memory for compressed page: %u, 
size=%zu\n",
index, clen);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 06b1636..78d7e8f 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -85,6 +85,7 @@ struct zram_stats {
atomic64_t zero_pages;  /* no. of zero filled pages */
atomic64_t pages_stored;/* no. of pages currently stored */
atomic_long_t max_used_pages;   /* no. of maximum pages stored */
+   atomic64_t num_recompress; /* no. of failed compression fast paths */
 };
 
 struct zram_meta {
-- 
2.8.2



Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
On (05/03/16 11:20), Minchan Kim wrote:
[..]
> We are concerning about returing back to no per-cpu options but actually,
> I don't want. If duplicate compression is really problem(But It's really
> unlikely), we should try to solve the problem itself with different way
> rather than roll-back to old, first of all.
> 
> I hope we can. So let's not add big worry about adding new dup stat. :)

ok, no prob. do you want it a separate sysfs node or a column in mm_stat?
I'd prefer mm_stat column, or somewhere in those cumulative files; not a
dedicated node: we decided to get rid of them some time ago.

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
On (05/03/16 11:20), Minchan Kim wrote:
[..]
> We are concerning about returing back to no per-cpu options but actually,
> I don't want. If duplicate compression is really problem(But It's really
> unlikely), we should try to solve the problem itself with different way
> rather than roll-back to old, first of all.
> 
> I hope we can. So let's not add big worry about adding new dup stat. :)

ok, no prob. do you want it a separate sysfs node or a column in mm_stat?
I'd prefer mm_stat column, or somewhere in those cumulative files; not a
dedicated node: we decided to get rid of them some time ago.

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
Hi Sergey!

On Tue, May 03, 2016 at 10:53:33AM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (05/03/16 10:40), Minchan Kim wrote:
> > > 
> > > ...hm...  inc ->failed_writes?
> [..]
> > Okay, let's add the knob to the existing sysfs(There is no different
> > between sysfs and debugfs with point of userspace once they start to
> > use it) because no need to add new code to avoid such mess.
> > 
> > Any thoughts?
> 
> so you don't want to account failed fast-path writes in failed_writes?

Yes, I think we cannot reuse the field.

> it sort of kind of fits, to some extent. re-compression is, basically,
> a new write operation -- allocate handle, map the page again, compress,
> etc., etc. so in a sense failed fast-path write is _almost_ a failed write,
> except that we took extra care of handling it and retried the op inside
> of zram, not from bio or fs layer.

Maybe, I don't understand you but my point is simple.
Documentation/ABI/testing/sysfs-block-zram says

What:   /sys/block/zram/failed_writes
Date:   February 2014
Contact:Sergey Senozhatsky 
Description:
The failed_writes file is read-only and specifies the number of
failed writes happened on this device.

If user read it, they will think failed_writes means write-error on
the device. As well, it's true until now. :)

If we piggy back on that to represent duplicate compression, users
will start to see the increased count and they wonder "Hmm, fs or swap
on zram might be corrupted" and don't use zram any more. Otherwise,
they will report it to us. :) We should explain to them "Hey, it's not
failed write but just duplicated I/O blah blah. Please read document
and we already corrected the wording in the document to represent
current status." which would be painful both users and us.

Rather than piggy backing on failed_writes, actually I want to remove
failed_[read|write] which is no point stat, I think.

We are concerning about returing back to no per-cpu options but actually,
I don't want. If duplicate compression is really problem(But It's really
unlikely), we should try to solve the problem itself with different way
rather than roll-back to old, first of all.

I hope we can. So let's not add big worry about adding new dup stat. :)



Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
Hi Sergey!

On Tue, May 03, 2016 at 10:53:33AM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (05/03/16 10:40), Minchan Kim wrote:
> > > 
> > > ...hm...  inc ->failed_writes?
> [..]
> > Okay, let's add the knob to the existing sysfs(There is no different
> > between sysfs and debugfs with point of userspace once they start to
> > use it) because no need to add new code to avoid such mess.
> > 
> > Any thoughts?
> 
> so you don't want to account failed fast-path writes in failed_writes?

Yes, I think we cannot reuse the field.

> it sort of kind of fits, to some extent. re-compression is, basically,
> a new write operation -- allocate handle, map the page again, compress,
> etc., etc. so in a sense failed fast-path write is _almost_ a failed write,
> except that we took extra care of handling it and retried the op inside
> of zram, not from bio or fs layer.

Maybe, I don't understand you but my point is simple.
Documentation/ABI/testing/sysfs-block-zram says

What:   /sys/block/zram/failed_writes
Date:   February 2014
Contact:Sergey Senozhatsky 
Description:
The failed_writes file is read-only and specifies the number of
failed writes happened on this device.

If user read it, they will think failed_writes means write-error on
the device. As well, it's true until now. :)

If we piggy back on that to represent duplicate compression, users
will start to see the increased count and they wonder "Hmm, fs or swap
on zram might be corrupted" and don't use zram any more. Otherwise,
they will report it to us. :) We should explain to them "Hey, it's not
failed write but just duplicated I/O blah blah. Please read document
and we already corrected the wording in the document to represent
current status." which would be painful both users and us.

Rather than piggy backing on failed_writes, actually I want to remove
failed_[read|write] which is no point stat, I think.

We are concerning about returing back to no per-cpu options but actually,
I don't want. If duplicate compression is really problem(But It's really
unlikely), we should try to solve the problem itself with different way
rather than roll-back to old, first of all.

I hope we can. So let's not add big worry about adding new dup stat. :)



Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
Hello Minchan,

On (05/03/16 10:40), Minchan Kim wrote:
> > 
> > ...hm...  inc ->failed_writes?
[..]
> Okay, let's add the knob to the existing sysfs(There is no different
> between sysfs and debugfs with point of userspace once they start to
> use it) because no need to add new code to avoid such mess.
> 
> Any thoughts?

so you don't want to account failed fast-path writes in failed_writes?
it sort of kind of fits, to some extent. re-compression is, basically,
a new write operation -- allocate handle, map the page again, compress,
etc., etc. so in a sense failed fast-path write is _almost_ a failed write,
except that we took extra care of handling it and retried the op inside
of zram, not from bio or fs layer.

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
Hello Minchan,

On (05/03/16 10:40), Minchan Kim wrote:
> > 
> > ...hm...  inc ->failed_writes?
[..]
> Okay, let's add the knob to the existing sysfs(There is no different
> between sysfs and debugfs with point of userspace once they start to
> use it) because no need to add new code to avoid such mess.
> 
> Any thoughts?

so you don't want to account failed fast-path writes in failed_writes?
it sort of kind of fits, to some extent. re-compression is, basically,
a new write operation -- allocate handle, map the page again, compress,
etc., etc. so in a sense failed fast-path write is _almost_ a failed write,
except that we took extra care of handling it and retried the op inside
of zram, not from bio or fs layer.

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
On Mon, May 02, 2016 at 06:21:57PM +0900, Sergey Senozhatsky wrote:
> On (05/02/16 17:28), Minchan Kim wrote:
> [..]
> > > aha... I misunderstood you. I thought you talked about the test results
> > > in particular, not about zram stats in general. hm, given that we don't
> > > close the door for 'idle compression streams list' yet, and may revert
> > > per-cpu streams, may be we can introduce this stat in 4.8? otherwise, in
> > > the worst case, we will have to trim a user visible stat file once again
> > > IF we, for some reason, will decide to return idle list back (hopefully
> > > we will not). does it sound OK to you to not touch the stat file now?
> > 
> > My concern is how we can capture such regression without introducing
> > the stat of recompression? Do you have an idea? :)
> 
> ...hm...  inc ->failed_writes?
> 
> ... or as a dirty and ugly and illegal (read "undocumented") hack, we
> probably can use ->failed_reads for that purpose. simply because I don't
> think any one has ever seen ->failed_reads != 0.

:(

How about adding new debugfs for zram? And let's put unstable stats or
things we need to debug to there.

I got lessson from zs_create_pool discussion a few days ago,
"debugfs is just optioinal feature so it should be okay to not support
sometime e.g., -ENOMEM, -EEXIST and so on " although I'm not sure.

Logically, it might be right and easy to say but I don't believe it
makes sense practically. Once *userspace* relies on it, we should keep
the rule. No matter we said it thorugh document

I hope it's not only one person, fortunately one more people.

https://lwn.net/Articles/309298/

Okay, let's add the knob to the existing sysfs(There is no different
between sysfs and debugfs with point of userspace once they start to
use it) because no need to add new code to avoid such mess.

Any thoughts?


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
On Mon, May 02, 2016 at 06:21:57PM +0900, Sergey Senozhatsky wrote:
> On (05/02/16 17:28), Minchan Kim wrote:
> [..]
> > > aha... I misunderstood you. I thought you talked about the test results
> > > in particular, not about zram stats in general. hm, given that we don't
> > > close the door for 'idle compression streams list' yet, and may revert
> > > per-cpu streams, may be we can introduce this stat in 4.8? otherwise, in
> > > the worst case, we will have to trim a user visible stat file once again
> > > IF we, for some reason, will decide to return idle list back (hopefully
> > > we will not). does it sound OK to you to not touch the stat file now?
> > 
> > My concern is how we can capture such regression without introducing
> > the stat of recompression? Do you have an idea? :)
> 
> ...hm...  inc ->failed_writes?
> 
> ... or as a dirty and ugly and illegal (read "undocumented") hack, we
> probably can use ->failed_reads for that purpose. simply because I don't
> think any one has ever seen ->failed_reads != 0.

:(

How about adding new debugfs for zram? And let's put unstable stats or
things we need to debug to there.

I got lessson from zs_create_pool discussion a few days ago,
"debugfs is just optioinal feature so it should be okay to not support
sometime e.g., -ENOMEM, -EEXIST and so on " although I'm not sure.

Logically, it might be right and easy to say but I don't believe it
makes sense practically. Once *userspace* relies on it, we should keep
the rule. No matter we said it thorugh document

I hope it's not only one person, fortunately one more people.

https://lwn.net/Articles/309298/

Okay, let's add the knob to the existing sysfs(There is no different
between sysfs and debugfs with point of userspace once they start to
use it) because no need to add new code to avoid such mess.

Any thoughts?


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
On (05/02/16 17:28), Minchan Kim wrote:
[..]
> > aha... I misunderstood you. I thought you talked about the test results
> > in particular, not about zram stats in general. hm, given that we don't
> > close the door for 'idle compression streams list' yet, and may revert
> > per-cpu streams, may be we can introduce this stat in 4.8? otherwise, in
> > the worst case, we will have to trim a user visible stat file once again
> > IF we, for some reason, will decide to return idle list back (hopefully
> > we will not). does it sound OK to you to not touch the stat file now?
> 
> My concern is how we can capture such regression without introducing
> the stat of recompression? Do you have an idea? :)

...hm...  inc ->failed_writes?

... or as a dirty and ugly and illegal (read "undocumented") hack, we
probably can use ->failed_reads for that purpose. simply because I don't
think any one has ever seen ->failed_reads != 0.

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
On (05/02/16 17:28), Minchan Kim wrote:
[..]
> > aha... I misunderstood you. I thought you talked about the test results
> > in particular, not about zram stats in general. hm, given that we don't
> > close the door for 'idle compression streams list' yet, and may revert
> > per-cpu streams, may be we can introduce this stat in 4.8? otherwise, in
> > the worst case, we will have to trim a user visible stat file once again
> > IF we, for some reason, will decide to return idle list back (hopefully
> > we will not). does it sound OK to you to not touch the stat file now?
> 
> My concern is how we can capture such regression without introducing
> the stat of recompression? Do you have an idea? :)

...hm...  inc ->failed_writes?

... or as a dirty and ugly and illegal (read "undocumented") hack, we
probably can use ->failed_reads for that purpose. simply because I don't
think any one has ever seen ->failed_reads != 0.

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
On Mon, May 02, 2016 at 04:25:08PM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (05/02/16 15:23), Minchan Kim wrote:
> [..]
> > Thanks for the your great test.
> > Today, I tried making heavy memory pressure to make dobule compression
> > but it was not easy so I guess it's really hard to get in real practice
> > I hope it doesn't make any regression. ;-)
> 
> :) it is quite 'hard', indeed. huge 're-compression' numbers usually
> come together with OOM-kill and OOM-panic. per my 'testing experience'.
> 
> I'm thinking about making that test a 'general zram' test and post
> it somewhere on github/etc., so we it'll be easier to measure and
> compare things.

Indeed!

> 
> > >  bool zcomp_set_max_streams(struct zcomp *comp, int num_strm)
> > >  {
> > > - return comp->set_max_streams(comp, num_strm);
> > > + return true;
> > >  }
> > 
> > I like this part. We don't need to remove set_max_stream interface
> > right now. Because percpu might make regression if double comp raio
> > is high. If so, we should roll back so let's wait for two year and
> > if anyting is not wrong, then we could deprecate multi stream
> > interfaces. I hope you are on same page.
> 
> sure. I did this intentionally, because I really don't want to create a
> mess of attrs that are about to retire. we have N attrs that will disappear
> or see a downcast (and we warn users in kernel log) next summer (4.10 or
> 4.11, IIRC). if we add max_comp_stream to the list now then we either should
> abandon our regular "2 years of retirement warn" rule and drop max_comp_stream
> only after 1 year (because it has 1 year of overlap with already scheduled
> for removal attrs) or complicate things for zram users: attrs will be dropped
> in different kernel releases and users are advised to keep that in mind. I
> think it's fine to have it as a NOOP attr as of now and deprecate it during
> the next 'attr cleanup' 2 years cycle. yes, we are on the same page ;)
> 
> > > @@ -356,10 +226,7 @@ struct zcomp *zcomp_create(const char *compress, int 
> > > max_strm)
> > 
> > Trivial:
> > We could remove max_strm now and change description.
> 
> oh, yes.
> 
> > > + zcomp_strm_release(zram->comp, zstrm);
> > > + zstrm = NULL;
> > > +
> > > + handle = zs_malloc(meta->mem_pool, clen,
> > > + GFP_NOIO | __GFP_HIGHMEM);
> > > + if (handle)
> > > + goto compress_again;
> > 
> > We can avoid page_zero_filled in second trial but not sure it is worth
> > to do because in my experiment, not easy to make double compression.
> > If I did, the ratio is really small compared total_write so it doesn't
> > need to add new branch to detect it in normal path.
> >
> > Acutally, I asked adding dobule compression stat to you. It seems you
> > forgot but okay. Because I want to change stat code and contents so
> > I will send a patch after I send rework about stat. :)
> 
> aha... I misunderstood you. I thought you talked about the test results
> in particular, not about zram stats in general. hm, given that we don't
> close the door for 'idle compression streams list' yet, and may revert
> per-cpu streams, may be we can introduce this stat in 4.8? otherwise, in
> the worst case, we will have to trim a user visible stat file once again
> IF we, for some reason, will decide to return idle list back (hopefully
> we will not). does it sound OK to you to not touch the stat file now?

My concern is how we can capture such regression without introducing
the stat of recompression? Do you have an idea? :)


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
On Mon, May 02, 2016 at 04:25:08PM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (05/02/16 15:23), Minchan Kim wrote:
> [..]
> > Thanks for the your great test.
> > Today, I tried making heavy memory pressure to make dobule compression
> > but it was not easy so I guess it's really hard to get in real practice
> > I hope it doesn't make any regression. ;-)
> 
> :) it is quite 'hard', indeed. huge 're-compression' numbers usually
> come together with OOM-kill and OOM-panic. per my 'testing experience'.
> 
> I'm thinking about making that test a 'general zram' test and post
> it somewhere on github/etc., so we it'll be easier to measure and
> compare things.

Indeed!

> 
> > >  bool zcomp_set_max_streams(struct zcomp *comp, int num_strm)
> > >  {
> > > - return comp->set_max_streams(comp, num_strm);
> > > + return true;
> > >  }
> > 
> > I like this part. We don't need to remove set_max_stream interface
> > right now. Because percpu might make regression if double comp raio
> > is high. If so, we should roll back so let's wait for two year and
> > if anyting is not wrong, then we could deprecate multi stream
> > interfaces. I hope you are on same page.
> 
> sure. I did this intentionally, because I really don't want to create a
> mess of attrs that are about to retire. we have N attrs that will disappear
> or see a downcast (and we warn users in kernel log) next summer (4.10 or
> 4.11, IIRC). if we add max_comp_stream to the list now then we either should
> abandon our regular "2 years of retirement warn" rule and drop max_comp_stream
> only after 1 year (because it has 1 year of overlap with already scheduled
> for removal attrs) or complicate things for zram users: attrs will be dropped
> in different kernel releases and users are advised to keep that in mind. I
> think it's fine to have it as a NOOP attr as of now and deprecate it during
> the next 'attr cleanup' 2 years cycle. yes, we are on the same page ;)
> 
> > > @@ -356,10 +226,7 @@ struct zcomp *zcomp_create(const char *compress, int 
> > > max_strm)
> > 
> > Trivial:
> > We could remove max_strm now and change description.
> 
> oh, yes.
> 
> > > + zcomp_strm_release(zram->comp, zstrm);
> > > + zstrm = NULL;
> > > +
> > > + handle = zs_malloc(meta->mem_pool, clen,
> > > + GFP_NOIO | __GFP_HIGHMEM);
> > > + if (handle)
> > > + goto compress_again;
> > 
> > We can avoid page_zero_filled in second trial but not sure it is worth
> > to do because in my experiment, not easy to make double compression.
> > If I did, the ratio is really small compared total_write so it doesn't
> > need to add new branch to detect it in normal path.
> >
> > Acutally, I asked adding dobule compression stat to you. It seems you
> > forgot but okay. Because I want to change stat code and contents so
> > I will send a patch after I send rework about stat. :)
> 
> aha... I misunderstood you. I thought you talked about the test results
> in particular, not about zram stats in general. hm, given that we don't
> close the door for 'idle compression streams list' yet, and may revert
> per-cpu streams, may be we can introduce this stat in 4.8? otherwise, in
> the worst case, we will have to trim a user visible stat file once again
> IF we, for some reason, will decide to return idle list back (hopefully
> we will not). does it sound OK to you to not touch the stat file now?

My concern is how we can capture such regression without introducing
the stat of recompression? Do you have an idea? :)


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
On (05/02/16 16:25), Sergey Senozhatsky wrote:
[..]
> > Trivial:
> > We could remove max_strm now and change description.
> 
> oh, yes.

how about something like this? remove max_comp_streams entirely, but
leave the attr. if we keep zram->max_comp_streams and return its value
(set by user space) from show() handler, we are techically lying;
because the actual number of streams is now num_online_cpus().


===8<===8<===

From: Sergey Senozhatsky 
Subject: [PATCH] zram: remove max_comp_streams internals

Remove the internal part of max_comp_streams interface, since we
switched to per-cpu streams. We will keep RW max_comp_streams attr
around, because:

a) we may (silently) switch back to idle compression streams list
   and don't want to disturb user space
b) max_comp_streams attr must wait for the next 'lay off cycle';
   we give user space 2 years to adjust before we remove/downgrade
   the attr, and there are already several attrs scheduled for
   removal in 4.11, so it's too late for max_comp_streams.

Signed-off-by: Sergey Senozhatsky 
---
 drivers/block/zram/zcomp.c|  7 +--
 drivers/block/zram/zcomp.h|  2 +-
 drivers/block/zram/zram_drv.c | 47 +++
 drivers/block/zram/zram_drv.h |  1 -
 4 files changed, 14 insertions(+), 43 deletions(-)

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index d4159e4..d4de9cb 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -95,11 +95,6 @@ bool zcomp_available_algorithm(const char *comp)
return find_backend(comp) != NULL;
 }
 
-bool zcomp_set_max_streams(struct zcomp *comp, int num_strm)
-{
-   return true;
-}
-
 struct zcomp_strm *zcomp_strm_find(struct zcomp *comp)
 {
return *get_cpu_ptr(comp->stream);
@@ -211,7 +206,7 @@ void zcomp_destroy(struct zcomp *comp)
  * case of allocation error, or any other error potentially
  * returned by functions zcomp_strm_{multi,single}_create.
  */
-struct zcomp *zcomp_create(const char *compress, int max_strm)
+struct zcomp *zcomp_create(const char *compress)
 {
struct zcomp *comp;
struct zcomp_backend *backend;
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
index aba8c21..ffd88cb 100644
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -45,7 +45,7 @@ struct zcomp {
 ssize_t zcomp_available_show(const char *comp, char *buf);
 bool zcomp_available_algorithm(const char *comp);
 
-struct zcomp *zcomp_create(const char *comp, int max_strm);
+struct zcomp *zcomp_create(const char *comp);
 void zcomp_destroy(struct zcomp *comp);
 
 struct zcomp_strm *zcomp_strm_find(struct zcomp *comp);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index cad1751..817e511 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -304,46 +304,25 @@ static ssize_t mem_used_max_store(struct device *dev,
return len;
 }
 
+/*
+ * We switched to per-cpu streams and this attr is not needed anymore.
+ * However, we will keep it around for some time, because:
+ * a) we may revert per-cpu streams in the future
+ * b) it's visible to user space and we need to follow our 2 years
+ *retirement rule; but we already have a number of 'soon to be
+ *altered' attrs, so max_comp_streams need to wait for the next
+ *layoff cycle.
+ */
 static ssize_t max_comp_streams_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   int val;
-   struct zram *zram = dev_to_zram(dev);
-
-   down_read(>init_lock);
-   val = zram->max_comp_streams;
-   up_read(>init_lock);
-
-   return scnprintf(buf, PAGE_SIZE, "%d\n", val);
+   return scnprintf(buf, PAGE_SIZE, "%d\n", num_online_cpus());
 }
 
 static ssize_t max_comp_streams_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t len)
 {
-   int num;
-   struct zram *zram = dev_to_zram(dev);
-   int ret;
-
-   ret = kstrtoint(buf, 0, );
-   if (ret < 0)
-   return ret;
-   if (num < 1)
-   return -EINVAL;
-
-   down_write(>init_lock);
-   if (init_done(zram)) {
-   if (!zcomp_set_max_streams(zram->comp, num)) {
-   pr_info("Cannot change max compression streams\n");
-   ret = -EINVAL;
-   goto out;
-   }
-   }
-
-   zram->max_comp_streams = num;
-   ret = len;
-out:
-   up_write(>init_lock);
-   return ret;
+   return len;
 }
 
 static ssize_t comp_algorithm_show(struct device *dev,
@@ -1035,7 +1014,6 @@ static void zram_reset_device(struct zram *zram)
/* Reset stats */
memset(>stats, 0, sizeof(zram->stats));
zram->disksize = 0;
-   zram->max_comp_streams = 1;
 
set_capacity(zram->disk, 0);
part_stat_set_all(>disk->part0, 0);
@@ 

Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
On (05/02/16 16:25), Sergey Senozhatsky wrote:
[..]
> > Trivial:
> > We could remove max_strm now and change description.
> 
> oh, yes.

how about something like this? remove max_comp_streams entirely, but
leave the attr. if we keep zram->max_comp_streams and return its value
(set by user space) from show() handler, we are techically lying;
because the actual number of streams is now num_online_cpus().


===8<===8<===

From: Sergey Senozhatsky 
Subject: [PATCH] zram: remove max_comp_streams internals

Remove the internal part of max_comp_streams interface, since we
switched to per-cpu streams. We will keep RW max_comp_streams attr
around, because:

a) we may (silently) switch back to idle compression streams list
   and don't want to disturb user space
b) max_comp_streams attr must wait for the next 'lay off cycle';
   we give user space 2 years to adjust before we remove/downgrade
   the attr, and there are already several attrs scheduled for
   removal in 4.11, so it's too late for max_comp_streams.

Signed-off-by: Sergey Senozhatsky 
---
 drivers/block/zram/zcomp.c|  7 +--
 drivers/block/zram/zcomp.h|  2 +-
 drivers/block/zram/zram_drv.c | 47 +++
 drivers/block/zram/zram_drv.h |  1 -
 4 files changed, 14 insertions(+), 43 deletions(-)

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index d4159e4..d4de9cb 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -95,11 +95,6 @@ bool zcomp_available_algorithm(const char *comp)
return find_backend(comp) != NULL;
 }
 
-bool zcomp_set_max_streams(struct zcomp *comp, int num_strm)
-{
-   return true;
-}
-
 struct zcomp_strm *zcomp_strm_find(struct zcomp *comp)
 {
return *get_cpu_ptr(comp->stream);
@@ -211,7 +206,7 @@ void zcomp_destroy(struct zcomp *comp)
  * case of allocation error, or any other error potentially
  * returned by functions zcomp_strm_{multi,single}_create.
  */
-struct zcomp *zcomp_create(const char *compress, int max_strm)
+struct zcomp *zcomp_create(const char *compress)
 {
struct zcomp *comp;
struct zcomp_backend *backend;
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
index aba8c21..ffd88cb 100644
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -45,7 +45,7 @@ struct zcomp {
 ssize_t zcomp_available_show(const char *comp, char *buf);
 bool zcomp_available_algorithm(const char *comp);
 
-struct zcomp *zcomp_create(const char *comp, int max_strm);
+struct zcomp *zcomp_create(const char *comp);
 void zcomp_destroy(struct zcomp *comp);
 
 struct zcomp_strm *zcomp_strm_find(struct zcomp *comp);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index cad1751..817e511 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -304,46 +304,25 @@ static ssize_t mem_used_max_store(struct device *dev,
return len;
 }
 
+/*
+ * We switched to per-cpu streams and this attr is not needed anymore.
+ * However, we will keep it around for some time, because:
+ * a) we may revert per-cpu streams in the future
+ * b) it's visible to user space and we need to follow our 2 years
+ *retirement rule; but we already have a number of 'soon to be
+ *altered' attrs, so max_comp_streams need to wait for the next
+ *layoff cycle.
+ */
 static ssize_t max_comp_streams_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   int val;
-   struct zram *zram = dev_to_zram(dev);
-
-   down_read(>init_lock);
-   val = zram->max_comp_streams;
-   up_read(>init_lock);
-
-   return scnprintf(buf, PAGE_SIZE, "%d\n", val);
+   return scnprintf(buf, PAGE_SIZE, "%d\n", num_online_cpus());
 }
 
 static ssize_t max_comp_streams_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t len)
 {
-   int num;
-   struct zram *zram = dev_to_zram(dev);
-   int ret;
-
-   ret = kstrtoint(buf, 0, );
-   if (ret < 0)
-   return ret;
-   if (num < 1)
-   return -EINVAL;
-
-   down_write(>init_lock);
-   if (init_done(zram)) {
-   if (!zcomp_set_max_streams(zram->comp, num)) {
-   pr_info("Cannot change max compression streams\n");
-   ret = -EINVAL;
-   goto out;
-   }
-   }
-
-   zram->max_comp_streams = num;
-   ret = len;
-out:
-   up_write(>init_lock);
-   return ret;
+   return len;
 }
 
 static ssize_t comp_algorithm_show(struct device *dev,
@@ -1035,7 +1014,6 @@ static void zram_reset_device(struct zram *zram)
/* Reset stats */
memset(>stats, 0, sizeof(zram->stats));
zram->disksize = 0;
-   zram->max_comp_streams = 1;
 
set_capacity(zram->disk, 0);
part_stat_set_all(>disk->part0, 0);
@@ -1064,7 +1042,7 @@ static ssize_t disksize_store(struct device 

Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
Hello Minchan,

On (05/02/16 15:23), Minchan Kim wrote:
[..]
> Thanks for the your great test.
> Today, I tried making heavy memory pressure to make dobule compression
> but it was not easy so I guess it's really hard to get in real practice
> I hope it doesn't make any regression. ;-)

:) it is quite 'hard', indeed. huge 're-compression' numbers usually
come together with OOM-kill and OOM-panic. per my 'testing experience'.

I'm thinking about making that test a 'general zram' test and post
it somewhere on github/etc., so we it'll be easier to measure and
compare things.

> >  bool zcomp_set_max_streams(struct zcomp *comp, int num_strm)
> >  {
> > -   return comp->set_max_streams(comp, num_strm);
> > +   return true;
> >  }
> 
> I like this part. We don't need to remove set_max_stream interface
> right now. Because percpu might make regression if double comp raio
> is high. If so, we should roll back so let's wait for two year and
> if anyting is not wrong, then we could deprecate multi stream
> interfaces. I hope you are on same page.

sure. I did this intentionally, because I really don't want to create a
mess of attrs that are about to retire. we have N attrs that will disappear
or see a downcast (and we warn users in kernel log) next summer (4.10 or
4.11, IIRC). if we add max_comp_stream to the list now then we either should
abandon our regular "2 years of retirement warn" rule and drop max_comp_stream
only after 1 year (because it has 1 year of overlap with already scheduled
for removal attrs) or complicate things for zram users: attrs will be dropped
in different kernel releases and users are advised to keep that in mind. I
think it's fine to have it as a NOOP attr as of now and deprecate it during
the next 'attr cleanup' 2 years cycle. yes, we are on the same page ;)

> > @@ -356,10 +226,7 @@ struct zcomp *zcomp_create(const char *compress, int 
> > max_strm)
> 
> Trivial:
> We could remove max_strm now and change description.

oh, yes.

> > +   zcomp_strm_release(zram->comp, zstrm);
> > +   zstrm = NULL;
> > +
> > +   handle = zs_malloc(meta->mem_pool, clen,
> > +   GFP_NOIO | __GFP_HIGHMEM);
> > +   if (handle)
> > +   goto compress_again;
> 
> We can avoid page_zero_filled in second trial but not sure it is worth
> to do because in my experiment, not easy to make double compression.
> If I did, the ratio is really small compared total_write so it doesn't
> need to add new branch to detect it in normal path.
>
> Acutally, I asked adding dobule compression stat to you. It seems you
> forgot but okay. Because I want to change stat code and contents so
> I will send a patch after I send rework about stat. :)

aha... I misunderstood you. I thought you talked about the test results
in particular, not about zram stats in general. hm, given that we don't
close the door for 'idle compression streams list' yet, and may revert
per-cpu streams, may be we can introduce this stat in 4.8? otherwise, in
the worst case, we will have to trim a user visible stat file once again
IF we, for some reason, will decide to return idle list back (hopefully
we will not). does it sound OK to you to not touch the stat file now?

> Thanks for the great work, Sergey!

thanks!

> Acked-by: Minchan Kim 

thanks!

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Sergey Senozhatsky
Hello Minchan,

On (05/02/16 15:23), Minchan Kim wrote:
[..]
> Thanks for the your great test.
> Today, I tried making heavy memory pressure to make dobule compression
> but it was not easy so I guess it's really hard to get in real practice
> I hope it doesn't make any regression. ;-)

:) it is quite 'hard', indeed. huge 're-compression' numbers usually
come together with OOM-kill and OOM-panic. per my 'testing experience'.

I'm thinking about making that test a 'general zram' test and post
it somewhere on github/etc., so we it'll be easier to measure and
compare things.

> >  bool zcomp_set_max_streams(struct zcomp *comp, int num_strm)
> >  {
> > -   return comp->set_max_streams(comp, num_strm);
> > +   return true;
> >  }
> 
> I like this part. We don't need to remove set_max_stream interface
> right now. Because percpu might make regression if double comp raio
> is high. If so, we should roll back so let's wait for two year and
> if anyting is not wrong, then we could deprecate multi stream
> interfaces. I hope you are on same page.

sure. I did this intentionally, because I really don't want to create a
mess of attrs that are about to retire. we have N attrs that will disappear
or see a downcast (and we warn users in kernel log) next summer (4.10 or
4.11, IIRC). if we add max_comp_stream to the list now then we either should
abandon our regular "2 years of retirement warn" rule and drop max_comp_stream
only after 1 year (because it has 1 year of overlap with already scheduled
for removal attrs) or complicate things for zram users: attrs will be dropped
in different kernel releases and users are advised to keep that in mind. I
think it's fine to have it as a NOOP attr as of now and deprecate it during
the next 'attr cleanup' 2 years cycle. yes, we are on the same page ;)

> > @@ -356,10 +226,7 @@ struct zcomp *zcomp_create(const char *compress, int 
> > max_strm)
> 
> Trivial:
> We could remove max_strm now and change description.

oh, yes.

> > +   zcomp_strm_release(zram->comp, zstrm);
> > +   zstrm = NULL;
> > +
> > +   handle = zs_malloc(meta->mem_pool, clen,
> > +   GFP_NOIO | __GFP_HIGHMEM);
> > +   if (handle)
> > +   goto compress_again;
> 
> We can avoid page_zero_filled in second trial but not sure it is worth
> to do because in my experiment, not easy to make double compression.
> If I did, the ratio is really small compared total_write so it doesn't
> need to add new branch to detect it in normal path.
>
> Acutally, I asked adding dobule compression stat to you. It seems you
> forgot but okay. Because I want to change stat code and contents so
> I will send a patch after I send rework about stat. :)

aha... I misunderstood you. I thought you talked about the test results
in particular, not about zram stats in general. hm, given that we don't
close the door for 'idle compression streams list' yet, and may revert
per-cpu streams, may be we can introduce this stat in 4.8? otherwise, in
the worst case, we will have to trim a user visible stat file once again
IF we, for some reason, will decide to return idle list back (hopefully
we will not). does it sound OK to you to not touch the stat file now?

> Thanks for the great work, Sergey!

thanks!

> Acked-by: Minchan Kim 

thanks!

-ss


Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
Hello Sergey,

Sorry for the late reply.

On Fri, Apr 29, 2016 at 01:17:10AM +0900, Sergey Senozhatsky wrote:
> Remove idle streams list and keep compression streams in per-cpu
> data. This removes two contented spin_lock()/spin_unlock() calls
> from write path and also prevent write OP from being preempted
> while holding the compression stream, which can cause slow downs.
> 
> For instance, let's assume that we have N cpus and N-2
> max_comp_streams.TASK1 owns the last idle stream, TASK2-TASK3 come
> in with the write requests:
> 
>   TASK1TASK2  TASK3
>  zram_bvec_write()
>   spin_lock
>   find stream
>   spin_unlock
> 
>   compress
> 
>   <>   zram_bvec_write()
>spin_lock
>find stream
>spin_unlock
>  no_stream
>schedule
>  zram_bvec_write()
>   spin_lock
>   find_stream
>   spin_unlock
> no_stream
>   schedule
>spin_lock
>release stream
>spin_unlock
>  wake up TASK2
> 
> not only TASK2 and TASK3 will not get the stream, TASK1 will be
> preempted in the middle of its operation; while we would prefer
> it to finish compression and release the stream.
> 
> Test environment: x86_64, 4 CPU box, 3G zram, lzo
> 
> The following fio tests were executed:
>   read, randread, write, randwrite, rw, randrw
> with the increasing number of jobs from 1 to 10.
> 
> 4 streams8 streams   per-cpu
> ===
> jobs1
> READ:   2520.1MB/s   2566.5MB/s  2491.5MB/s
> READ:   2102.7MB/s   2104.2MB/s  2091.3MB/s
> WRITE:  1355.1MB/s   1320.2MB/s  1378.9MB/s
> WRITE:  1103.5MB/s   1097.2MB/s  1122.5MB/s
> READ:   434013KB/s   435153KB/s  439961KB/s
> WRITE:  433969KB/s   435109KB/s  439917KB/s
> READ:   403166KB/s   405139KB/s  403373KB/s
> WRITE:  403223KB/s   405197KB/s  403430KB/s
> jobs2
> READ:   7958.6MB/s   8105.6MB/s  8073.7MB/s
> READ:   6864.9MB/s   6989.8MB/s  7021.8MB/s
> WRITE:  2438.1MB/s   2346.9MB/s  3400.2MB/s
> WRITE:  1994.2MB/s   1990.3MB/s  2941.2MB/s
> READ:   981504KB/s   973906KB/s  1018.8MB/s
> WRITE:  981659KB/s   974060KB/s  1018.1MB/s
> READ:   937021KB/s   938976KB/s  987250KB/s
> WRITE:  934878KB/s   936830KB/s  984993KB/s
> jobs3
> READ:   13280MB/s13553MB/s   13553MB/s
> READ:   11534MB/s11785MB/s   11755MB/s
> WRITE:  3456.9MB/s   3469.9MB/s  4810.3MB/s
> WRITE:  3029.6MB/s   3031.6MB/s  4264.8MB/s
> READ:   1363.8MB/s   1362.6MB/s  1448.9MB/s
> WRITE:  1361.9MB/s   1360.7MB/s  1446.9MB/s
> READ:   1309.4MB/s   1310.6MB/s  1397.5MB/s
> WRITE:  1307.4MB/s   1308.5MB/s  1395.3MB/s
> jobs4
> READ:   20244MB/s20177MB/s   20344MB/s
> READ:   17886MB/s17913MB/s   17835MB/s
> WRITE:  4071.6MB/s   4046.1MB/s  6370.2MB/s
> WRITE:  3608.9MB/s   3576.3MB/s  5785.4MB/s
> READ:   1824.3MB/s   1821.6MB/s  1997.5MB/s
> WRITE:  1819.8MB/s   1817.4MB/s  1992.5MB/s
> READ:   1765.7MB/s   1768.3MB/s  1937.3MB/s
> WRITE:  1767.5MB/s   1769.1MB/s  1939.2MB/s
> jobs5
> READ:   18663MB/s18986MB/s   18823MB/s
> READ:   16659MB/s16605MB/s   16954MB/s
> WRITE:  3912.4MB/s   3888.7MB/s  6126.9MB/s
> WRITE:  3506.4MB/s   3442.5MB/s  5519.3MB/s
> READ:   1798.2MB/s   1746.5MB/s  1935.8MB/s
> WRITE:  1792.7MB/s   1740.7MB/s  1929.1MB/s
> READ:   1727.6MB/s   1658.2MB/s  1917.3MB/s
> WRITE:  1726.5MB/s   1657.2MB/s  1916.6MB/s
> jobs6
> READ:   21017MB/s20922MB/s   21162MB/s
> READ:   19022MB/s19140MB/s   18770MB/s
> WRITE:  3968.2MB/s   4037.7MB/s  6620.8MB/s
> WRITE:  3643.5MB/s   3590.2MB/s  6027.5MB/s
> READ:   1871.8MB/s   1880.5MB/s  2049.9MB/s
> WRITE:  1867.8MB/s   1877.2MB/s  2046.2MB/s
> READ:   1755.8MB/s   1710.3MB/s  1964.7MB/s
> WRITE:  1750.5MB/s   1705.9MB/s  1958.8MB/s
> jobs7
> READ:   21103MB/s20677MB/s   21482MB/s
> READ:   18522MB/s18379MB/s   19443MB/s
> WRITE:  4022.5MB/s   4067.4MB/s  6755.9MB/s
> WRITE:  

Re: [PATCH 2/2] zram: user per-cpu compression streams

2016-05-02 Thread Minchan Kim
Hello Sergey,

Sorry for the late reply.

On Fri, Apr 29, 2016 at 01:17:10AM +0900, Sergey Senozhatsky wrote:
> Remove idle streams list and keep compression streams in per-cpu
> data. This removes two contented spin_lock()/spin_unlock() calls
> from write path and also prevent write OP from being preempted
> while holding the compression stream, which can cause slow downs.
> 
> For instance, let's assume that we have N cpus and N-2
> max_comp_streams.TASK1 owns the last idle stream, TASK2-TASK3 come
> in with the write requests:
> 
>   TASK1TASK2  TASK3
>  zram_bvec_write()
>   spin_lock
>   find stream
>   spin_unlock
> 
>   compress
> 
>   <>   zram_bvec_write()
>spin_lock
>find stream
>spin_unlock
>  no_stream
>schedule
>  zram_bvec_write()
>   spin_lock
>   find_stream
>   spin_unlock
> no_stream
>   schedule
>spin_lock
>release stream
>spin_unlock
>  wake up TASK2
> 
> not only TASK2 and TASK3 will not get the stream, TASK1 will be
> preempted in the middle of its operation; while we would prefer
> it to finish compression and release the stream.
> 
> Test environment: x86_64, 4 CPU box, 3G zram, lzo
> 
> The following fio tests were executed:
>   read, randread, write, randwrite, rw, randrw
> with the increasing number of jobs from 1 to 10.
> 
> 4 streams8 streams   per-cpu
> ===
> jobs1
> READ:   2520.1MB/s   2566.5MB/s  2491.5MB/s
> READ:   2102.7MB/s   2104.2MB/s  2091.3MB/s
> WRITE:  1355.1MB/s   1320.2MB/s  1378.9MB/s
> WRITE:  1103.5MB/s   1097.2MB/s  1122.5MB/s
> READ:   434013KB/s   435153KB/s  439961KB/s
> WRITE:  433969KB/s   435109KB/s  439917KB/s
> READ:   403166KB/s   405139KB/s  403373KB/s
> WRITE:  403223KB/s   405197KB/s  403430KB/s
> jobs2
> READ:   7958.6MB/s   8105.6MB/s  8073.7MB/s
> READ:   6864.9MB/s   6989.8MB/s  7021.8MB/s
> WRITE:  2438.1MB/s   2346.9MB/s  3400.2MB/s
> WRITE:  1994.2MB/s   1990.3MB/s  2941.2MB/s
> READ:   981504KB/s   973906KB/s  1018.8MB/s
> WRITE:  981659KB/s   974060KB/s  1018.1MB/s
> READ:   937021KB/s   938976KB/s  987250KB/s
> WRITE:  934878KB/s   936830KB/s  984993KB/s
> jobs3
> READ:   13280MB/s13553MB/s   13553MB/s
> READ:   11534MB/s11785MB/s   11755MB/s
> WRITE:  3456.9MB/s   3469.9MB/s  4810.3MB/s
> WRITE:  3029.6MB/s   3031.6MB/s  4264.8MB/s
> READ:   1363.8MB/s   1362.6MB/s  1448.9MB/s
> WRITE:  1361.9MB/s   1360.7MB/s  1446.9MB/s
> READ:   1309.4MB/s   1310.6MB/s  1397.5MB/s
> WRITE:  1307.4MB/s   1308.5MB/s  1395.3MB/s
> jobs4
> READ:   20244MB/s20177MB/s   20344MB/s
> READ:   17886MB/s17913MB/s   17835MB/s
> WRITE:  4071.6MB/s   4046.1MB/s  6370.2MB/s
> WRITE:  3608.9MB/s   3576.3MB/s  5785.4MB/s
> READ:   1824.3MB/s   1821.6MB/s  1997.5MB/s
> WRITE:  1819.8MB/s   1817.4MB/s  1992.5MB/s
> READ:   1765.7MB/s   1768.3MB/s  1937.3MB/s
> WRITE:  1767.5MB/s   1769.1MB/s  1939.2MB/s
> jobs5
> READ:   18663MB/s18986MB/s   18823MB/s
> READ:   16659MB/s16605MB/s   16954MB/s
> WRITE:  3912.4MB/s   3888.7MB/s  6126.9MB/s
> WRITE:  3506.4MB/s   3442.5MB/s  5519.3MB/s
> READ:   1798.2MB/s   1746.5MB/s  1935.8MB/s
> WRITE:  1792.7MB/s   1740.7MB/s  1929.1MB/s
> READ:   1727.6MB/s   1658.2MB/s  1917.3MB/s
> WRITE:  1726.5MB/s   1657.2MB/s  1916.6MB/s
> jobs6
> READ:   21017MB/s20922MB/s   21162MB/s
> READ:   19022MB/s19140MB/s   18770MB/s
> WRITE:  3968.2MB/s   4037.7MB/s  6620.8MB/s
> WRITE:  3643.5MB/s   3590.2MB/s  6027.5MB/s
> READ:   1871.8MB/s   1880.5MB/s  2049.9MB/s
> WRITE:  1867.8MB/s   1877.2MB/s  2046.2MB/s
> READ:   1755.8MB/s   1710.3MB/s  1964.7MB/s
> WRITE:  1750.5MB/s   1705.9MB/s  1958.8MB/s
> jobs7
> READ:   21103MB/s20677MB/s   21482MB/s
> READ:   18522MB/s18379MB/s   19443MB/s
> WRITE:  4022.5MB/s   4067.4MB/s  6755.9MB/s
> WRITE:  

[PATCH 2/2] zram: user per-cpu compression streams

2016-04-28 Thread Sergey Senozhatsky
Remove idle streams list and keep compression streams in per-cpu
data. This removes two contented spin_lock()/spin_unlock() calls
from write path and also prevent write OP from being preempted
while holding the compression stream, which can cause slow downs.

For instance, let's assume that we have N cpus and N-2
max_comp_streams.TASK1 owns the last idle stream, TASK2-TASK3 come
in with the write requests:

  TASK1TASK2  TASK3
 zram_bvec_write()
  spin_lock
  find stream
  spin_unlock

  compress

  <>   zram_bvec_write()
   spin_lock
   find stream
   spin_unlock
 no_stream
   schedule
 zram_bvec_write()
  spin_lock
  find_stream
  spin_unlock
no_stream
  schedule
   spin_lock
   release stream
   spin_unlock
 wake up TASK2

not only TASK2 and TASK3 will not get the stream, TASK1 will be
preempted in the middle of its operation; while we would prefer
it to finish compression and release the stream.

Test environment: x86_64, 4 CPU box, 3G zram, lzo

The following fio tests were executed:
  read, randread, write, randwrite, rw, randrw
with the increasing number of jobs from 1 to 10.

4 streams8 streams   per-cpu
===
jobs1
READ:   2520.1MB/s   2566.5MB/s  2491.5MB/s
READ:   2102.7MB/s   2104.2MB/s  2091.3MB/s
WRITE:  1355.1MB/s   1320.2MB/s  1378.9MB/s
WRITE:  1103.5MB/s   1097.2MB/s  1122.5MB/s
READ:   434013KB/s   435153KB/s  439961KB/s
WRITE:  433969KB/s   435109KB/s  439917KB/s
READ:   403166KB/s   405139KB/s  403373KB/s
WRITE:  403223KB/s   405197KB/s  403430KB/s
jobs2
READ:   7958.6MB/s   8105.6MB/s  8073.7MB/s
READ:   6864.9MB/s   6989.8MB/s  7021.8MB/s
WRITE:  2438.1MB/s   2346.9MB/s  3400.2MB/s
WRITE:  1994.2MB/s   1990.3MB/s  2941.2MB/s
READ:   981504KB/s   973906KB/s  1018.8MB/s
WRITE:  981659KB/s   974060KB/s  1018.1MB/s
READ:   937021KB/s   938976KB/s  987250KB/s
WRITE:  934878KB/s   936830KB/s  984993KB/s
jobs3
READ:   13280MB/s13553MB/s   13553MB/s
READ:   11534MB/s11785MB/s   11755MB/s
WRITE:  3456.9MB/s   3469.9MB/s  4810.3MB/s
WRITE:  3029.6MB/s   3031.6MB/s  4264.8MB/s
READ:   1363.8MB/s   1362.6MB/s  1448.9MB/s
WRITE:  1361.9MB/s   1360.7MB/s  1446.9MB/s
READ:   1309.4MB/s   1310.6MB/s  1397.5MB/s
WRITE:  1307.4MB/s   1308.5MB/s  1395.3MB/s
jobs4
READ:   20244MB/s20177MB/s   20344MB/s
READ:   17886MB/s17913MB/s   17835MB/s
WRITE:  4071.6MB/s   4046.1MB/s  6370.2MB/s
WRITE:  3608.9MB/s   3576.3MB/s  5785.4MB/s
READ:   1824.3MB/s   1821.6MB/s  1997.5MB/s
WRITE:  1819.8MB/s   1817.4MB/s  1992.5MB/s
READ:   1765.7MB/s   1768.3MB/s  1937.3MB/s
WRITE:  1767.5MB/s   1769.1MB/s  1939.2MB/s
jobs5
READ:   18663MB/s18986MB/s   18823MB/s
READ:   16659MB/s16605MB/s   16954MB/s
WRITE:  3912.4MB/s   3888.7MB/s  6126.9MB/s
WRITE:  3506.4MB/s   3442.5MB/s  5519.3MB/s
READ:   1798.2MB/s   1746.5MB/s  1935.8MB/s
WRITE:  1792.7MB/s   1740.7MB/s  1929.1MB/s
READ:   1727.6MB/s   1658.2MB/s  1917.3MB/s
WRITE:  1726.5MB/s   1657.2MB/s  1916.6MB/s
jobs6
READ:   21017MB/s20922MB/s   21162MB/s
READ:   19022MB/s19140MB/s   18770MB/s
WRITE:  3968.2MB/s   4037.7MB/s  6620.8MB/s
WRITE:  3643.5MB/s   3590.2MB/s  6027.5MB/s
READ:   1871.8MB/s   1880.5MB/s  2049.9MB/s
WRITE:  1867.8MB/s   1877.2MB/s  2046.2MB/s
READ:   1755.8MB/s   1710.3MB/s  1964.7MB/s
WRITE:  1750.5MB/s   1705.9MB/s  1958.8MB/s
jobs7
READ:   21103MB/s20677MB/s   21482MB/s
READ:   18522MB/s18379MB/s   19443MB/s
WRITE:  4022.5MB/s   4067.4MB/s  6755.9MB/s
WRITE:  3691.7MB/s   3695.5MB/s  5925.6MB/s
READ:   1841.5MB/s   1933.9MB/s  2090.5MB/s
WRITE:  1842.7MB/s   1935.3MB/s  2091.9MB/s
READ:   1832.4MB/s   1856.4MB/s  1971.5MB/s
WRITE:  1822.3MB/s   1846.2MB/s  1960.6MB/s
jobs8
READ:   20463MB/s

[PATCH 2/2] zram: user per-cpu compression streams

2016-04-28 Thread Sergey Senozhatsky
Remove idle streams list and keep compression streams in per-cpu
data. This removes two contented spin_lock()/spin_unlock() calls
from write path and also prevent write OP from being preempted
while holding the compression stream, which can cause slow downs.

For instance, let's assume that we have N cpus and N-2
max_comp_streams.TASK1 owns the last idle stream, TASK2-TASK3 come
in with the write requests:

  TASK1TASK2  TASK3
 zram_bvec_write()
  spin_lock
  find stream
  spin_unlock

  compress

  <>   zram_bvec_write()
   spin_lock
   find stream
   spin_unlock
 no_stream
   schedule
 zram_bvec_write()
  spin_lock
  find_stream
  spin_unlock
no_stream
  schedule
   spin_lock
   release stream
   spin_unlock
 wake up TASK2

not only TASK2 and TASK3 will not get the stream, TASK1 will be
preempted in the middle of its operation; while we would prefer
it to finish compression and release the stream.

Test environment: x86_64, 4 CPU box, 3G zram, lzo

The following fio tests were executed:
  read, randread, write, randwrite, rw, randrw
with the increasing number of jobs from 1 to 10.

4 streams8 streams   per-cpu
===
jobs1
READ:   2520.1MB/s   2566.5MB/s  2491.5MB/s
READ:   2102.7MB/s   2104.2MB/s  2091.3MB/s
WRITE:  1355.1MB/s   1320.2MB/s  1378.9MB/s
WRITE:  1103.5MB/s   1097.2MB/s  1122.5MB/s
READ:   434013KB/s   435153KB/s  439961KB/s
WRITE:  433969KB/s   435109KB/s  439917KB/s
READ:   403166KB/s   405139KB/s  403373KB/s
WRITE:  403223KB/s   405197KB/s  403430KB/s
jobs2
READ:   7958.6MB/s   8105.6MB/s  8073.7MB/s
READ:   6864.9MB/s   6989.8MB/s  7021.8MB/s
WRITE:  2438.1MB/s   2346.9MB/s  3400.2MB/s
WRITE:  1994.2MB/s   1990.3MB/s  2941.2MB/s
READ:   981504KB/s   973906KB/s  1018.8MB/s
WRITE:  981659KB/s   974060KB/s  1018.1MB/s
READ:   937021KB/s   938976KB/s  987250KB/s
WRITE:  934878KB/s   936830KB/s  984993KB/s
jobs3
READ:   13280MB/s13553MB/s   13553MB/s
READ:   11534MB/s11785MB/s   11755MB/s
WRITE:  3456.9MB/s   3469.9MB/s  4810.3MB/s
WRITE:  3029.6MB/s   3031.6MB/s  4264.8MB/s
READ:   1363.8MB/s   1362.6MB/s  1448.9MB/s
WRITE:  1361.9MB/s   1360.7MB/s  1446.9MB/s
READ:   1309.4MB/s   1310.6MB/s  1397.5MB/s
WRITE:  1307.4MB/s   1308.5MB/s  1395.3MB/s
jobs4
READ:   20244MB/s20177MB/s   20344MB/s
READ:   17886MB/s17913MB/s   17835MB/s
WRITE:  4071.6MB/s   4046.1MB/s  6370.2MB/s
WRITE:  3608.9MB/s   3576.3MB/s  5785.4MB/s
READ:   1824.3MB/s   1821.6MB/s  1997.5MB/s
WRITE:  1819.8MB/s   1817.4MB/s  1992.5MB/s
READ:   1765.7MB/s   1768.3MB/s  1937.3MB/s
WRITE:  1767.5MB/s   1769.1MB/s  1939.2MB/s
jobs5
READ:   18663MB/s18986MB/s   18823MB/s
READ:   16659MB/s16605MB/s   16954MB/s
WRITE:  3912.4MB/s   3888.7MB/s  6126.9MB/s
WRITE:  3506.4MB/s   3442.5MB/s  5519.3MB/s
READ:   1798.2MB/s   1746.5MB/s  1935.8MB/s
WRITE:  1792.7MB/s   1740.7MB/s  1929.1MB/s
READ:   1727.6MB/s   1658.2MB/s  1917.3MB/s
WRITE:  1726.5MB/s   1657.2MB/s  1916.6MB/s
jobs6
READ:   21017MB/s20922MB/s   21162MB/s
READ:   19022MB/s19140MB/s   18770MB/s
WRITE:  3968.2MB/s   4037.7MB/s  6620.8MB/s
WRITE:  3643.5MB/s   3590.2MB/s  6027.5MB/s
READ:   1871.8MB/s   1880.5MB/s  2049.9MB/s
WRITE:  1867.8MB/s   1877.2MB/s  2046.2MB/s
READ:   1755.8MB/s   1710.3MB/s  1964.7MB/s
WRITE:  1750.5MB/s   1705.9MB/s  1958.8MB/s
jobs7
READ:   21103MB/s20677MB/s   21482MB/s
READ:   18522MB/s18379MB/s   19443MB/s
WRITE:  4022.5MB/s   4067.4MB/s  6755.9MB/s
WRITE:  3691.7MB/s   3695.5MB/s  5925.6MB/s
READ:   1841.5MB/s   1933.9MB/s  2090.5MB/s
WRITE:  1842.7MB/s   1935.3MB/s  2091.9MB/s
READ:   1832.4MB/s   1856.4MB/s  1971.5MB/s
WRITE:  1822.3MB/s   1846.2MB/s  1960.6MB/s
jobs8
READ:   20463MB/s