Re: [PATCH] Reduce memory usage

2021-12-24 Thread Johannes Altmanninger via rsync
On Thu, Dec 23, 2021 at 07:55:00PM +0100, Roland via rsync wrote:
> hello,
> 
> it's fantastic to see that such optimizations still being found.
> 
> out of curiosity - what is the status of this?  will it get merged ?

Yes, it was merged after being resubmitted nn github, see commit
ae1f0029 (Reduce memory usage (#235), 2021-10-01)
in https://github.com/WayneD/rsync

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] Reduce memory usage

2021-12-23 Thread Roland via rsync

hello,

it's fantastic to see that such optimizations still being found.

out of curiosity - what is the status of this?  will it get merged ?

roland

Am 28.09.21 um 12:05 schrieb Jindřich Makovička via rsync:

In 2004, an allocation optimization has been added to the file
list handling code, that preallocates 32k of file_struct pointers
in a file_list. This optimization predates the incremental
recursion feature, for which it is not appropriate anymore. When
copying a tree containing a large number of small directories,
using the incremental recursion, rsync allocates many short
file_lists. Suddenly, the unused file_struct pointers can easily
take 90-95% of the memory allocated by rsync.

This can be easily reproduced by using

valgrind --tool=massif ./rsync -anx /usr /tmp/

and checking the memory profile of the first (sender) process.

This patch changes the flist functions to start only with 32 entries
for the partial file lists, instead of 32 * 1024.

It also modifies the condition for the debug notification that the
allocated memory moved, so it does not depend on the initial allocation
size. Instead, it now simply checks if the file list has been already
allocated.
---
  flist.c | 9 +++--
  rsync.h | 5 +++--
  2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/flist.c b/flist.c
index 3442d868..530d336e 100644
--- a/flist.c
+++ b/flist.c
@@ -305,7 +305,7 @@ static void flist_expand(struct file_list *flist, int extra)

new_ptr = realloc_array(flist->files, struct file_struct *, 
flist->malloced);

-   if (DEBUG_GTE(FLIST, 1) && flist->malloced != FLIST_START) {
+   if (DEBUG_GTE(FLIST, 1) && flist->files) {
rprintf(FCLIENT, "[%s] expand file_list pointer array to %s bytes, 
did%s move\n",
who_am_i(),
big_num(sizeof flist->files[0] * flist->malloced),
@@ -2186,8 +2186,10 @@ struct file_list *send_file_list(int f, int argc, char 
*argv[])
  #endif

flist = cur_flist = flist_new(0, "send_file_list");
+   flist_expand(flist, FLIST_START_LARGE);
if (inc_recurse) {
dir_flist = flist_new(FLIST_TEMP, "send_file_list");
+   flist_expand(dir_flist, FLIST_START_LARGE);
flags |= FLAG_DIVERT_DIRS;
} else
dir_flist = cur_flist;
@@ -2541,10 +2543,13 @@ struct file_list *recv_file_list(int f, int dir_ndx)
  #endif

flist = flist_new(0, "recv_file_list");
+   flist_expand(flist, FLIST_START_LARGE);

if (inc_recurse) {
-   if (flist->ndx_start == 1)
+   if (flist->ndx_start == 1) {
dir_flist = flist_new(FLIST_TEMP, "recv_file_list");
+   flist_expand(dir_flist, FLIST_START_LARGE);
+   }
dstart = dir_flist->used;
} else {
dir_flist = flist;
diff --git a/rsync.h b/rsync.h
index 88319732..f8fcbffb 100644
--- a/rsync.h
+++ b/rsync.h
@@ -918,8 +918,9 @@ extern int xattrs_ndx;
   * Start the flist array at FLIST_START entries and grow it
   * by doubling until FLIST_LINEAR then grow by FLIST_LINEAR
   */
-#define FLIST_START(32 * 1024)
-#define FLIST_LINEAR   (FLIST_START * 512)
+#define FLIST_START(32)
+#define FLIST_START_LARGE  (32 * 1024)
+#define FLIST_LINEAR   (FLIST_START_LARGE * 512)

  /*
   * Extent size for allocation pools: A minimum size of 128KB


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] Reduce memory usage

2021-10-03 Thread Rupert Gallagher via rsync
 Original Message 
On Oct 2, 2021, 15:04, Jindřich Makovička < makov...@gmail.com> wrote:
Just note this patch has nothing to do with memory consumption vs performance. 
It just avoids allocating memory that was left unused anyway.

I can read-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Aw: Re: [PATCH] Reduce memory usage

2021-10-03 Thread Rupert Gallagher via rsync
 Original Message 
On Oct 2, 2021, 12:36, < devz...@web.de> wrote:
>>In the exchange I argued that proper use of ram as a buffer would have cut 
>>down backup time to minutes instead of days.

>could you give an example where rsync is slowing things down so much due to 
>ram constraints or inefficient ram use?

You find it all in the exchange I already referred to.

>please mind that disk bandwith and file copy bandwith is not the same. random 
>i/o and seek time is the culprit.

I am glad you did your homework.

>why should rsync use ram for buffering data it copies, if the linux kernel / 
>vm subsystem already does this?

Because my server is not using linux?-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Re: [PATCH] Reduce memory usage

2021-10-02 Thread Jindřich Makovička via rsync
Just note this patch has nothing to do with memory consumption vs
performance. It just avoids allocating memory that was left unused anyway.

On Sat, Oct 2, 2021, 12:51 devzero--- via rsync 
wrote:

> >In the exchange I argued that proper use of ram as a buffer would have
> cut down backup time to minutes instead of days.
>
> could you give an example where rsync is slowing things down so much due
> to ram constraints or inefficient ram use?
>
> please mind that disk bandwith and file copy bandwith is not the same.
> random i/o and seek time is the culprit.
>
> why should rsync use ram for buffering data it copies, if the linux kernel
> / vm subsystem already does this?
>
> roland
>
> *Gesendet:* Samstag, 02. Oktober 2021 um 12:07 Uhr
> *Von:* "Rupert Gallagher via rsync" 
> *An:* makov...@gmail.com, rsync@lists.samba.org
> *Betreff:* Re: [PATCH] Reduce memory usage
> If you look at my previous exchange in the list, I raised the need for
> more ram usage via a tool option. In the exchange I argued that proper use
> of ram as a buffer would have cut down backup time to minutes instead of
> days. At the time, my proposal was dismissed by someone saying that rsync
> uses as much ram as it needs. I still feel the need to free rsync from this
> mindless constraint, while also welcoming ram usage optimisations such as
> yours in this patch. How hard can it be to allow rsync to use 1GB of ram
> instead of 100MB? The benefit would be huge. In my case, where a supermicro
> server uses a shared bus to transfer data from two disks, the overhead
> caused by frequent small buffer IO is so high that backup time is still
> huge. And I am using server hardware! PC and laptops are even worse.
>
> RG
>
>
>
>
>  Original Message 
> On Sep 26, 2021, 13:54, Jindřich Makovička via rsync <
> rsync@lists.samba.org> wrote:
> Hi,
>
> When using rsync to back up the file system on my laptop, containing a
> pretty much default linux desktop, I was wondering how rsync uses over
> 100MB of RAM it allocates.
>
> It turned out that most of the memory is used for the arrays of
> file_struct pointers, most of which end up unused - much more than the
> actual file_struct entries. In my case, the peak usage was 135MB of
> pointers, and just 1.5MB of the file_struct entries themselves.
>
> The problem seems to be that the default file_list allocation parameters
> predate the incremental recursion, which allocates a huge number of small
> file lists, while AFAICS originally rsync allocated just one large list.
>
> Applying the attached patch, which reduces the default allocation to 32
> pointers, and preallocates 32K pointers only for the main file lists in
> send_file_list and recv_file_list, reduces the peak memory usage in my case
> from 142MB to 12MB.
>
> Regards,
> --
> Jindřich Makovička
> -- Please use reply-all for most replies to avoid omitting the mailing
> list. To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync Before posting, read:
> http://www.catb.org/~esr/faqs/smart-questions.html
> --
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
>
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Aw: Re: [PATCH] Reduce memory usage

2021-10-02 Thread devzero--- via rsync
>In the exchange I argued that proper use of ram as a buffer would have cut down backup time to minutes instead of days.

could you give an example where rsync is slowing things down so much due to ram constraints or inefficient ram use?


please mind that disk bandwith and file copy bandwith is not the same. random i/o and seek time is the culprit.

why should rsync use ram for buffering data it copies, if the linux kernel / vm subsystem already does this?

roland
 

Gesendet: Samstag, 02. Oktober 2021 um 12:07 Uhr
Von: "Rupert Gallagher via rsync" 
An: makov...@gmail.com, rsync@lists.samba.org
Betreff: Re: [PATCH] Reduce memory usage

If you look at my previous exchange in the list, I raised the need for more ram usage via a tool option. In the exchange I argued that proper use of ram as a buffer would have cut down backup time to minutes instead of days. At the time, my proposal was dismissed by someone saying that rsync uses as much ram as it needs. I still feel the need to free rsync from this mindless constraint, while also welcoming ram usage optimisations such as yours in this patch. How hard can it be to allow rsync to use 1GB of ram instead of 100MB? The benefit would be huge. In my case, where a supermicro server uses a shared bus to transfer data from two disks, the overhead caused by frequent small buffer IO is so high that backup time is still huge. And I am using server hardware! PC and laptops are even worse.

RG




 Original Message 
On Sep 26, 2021, 13:54, Jindřich Makovička via rsync < rsync@lists.samba.org> wrote:
Hi,

When using rsync to back up the file system on my laptop, containing a pretty much default linux desktop, I was wondering how rsync uses over 100MB of RAM it allocates.

It turned out that most of the memory is used for the arrays of file_struct pointers, most of which end up unused - much more than the actual file_struct entries. In my case, the peak usage was 135MB of pointers, and just 1.5MB of the file_struct entries themselves.

The problem seems to be that the default file_list allocation parameters predate the incremental recursion, which allocates a huge number of small file lists, while AFAICS originally rsync allocated just one large list.

Applying the attached patch, which reduces the default allocation to 32 pointers, and preallocates 32K pointers only for the main file lists in send_file_list and recv_file_list, reduces the peak memory usage in my case from 142MB to 12MB.

Regards,
--
Jindřich Makovička
-- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html




-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] Reduce memory usage

2021-10-02 Thread Rupert Gallagher via rsync
If you look at my previous exchange in the list, I raised the need for more ram 
usage via a tool option. In the exchange I argued that proper use of ram as a 
buffer would have cut down backup time to minutes instead of days. At the time, 
my proposal was dismissed by someone saying that rsync uses as much ram as it 
needs. I still feel the need to free rsync from this mindless constraint, while 
also welcoming ram usage optimisations such as yours in this patch. How hard 
can it be to allow rsync to use 1GB of ram instead of 100MB? The benefit would 
be huge. In my case, where a supermicro server uses a shared bus to transfer 
data from two disks, the overhead caused by frequent small buffer IO is so high 
that backup time is still huge. And I am using server hardware! PC and laptops 
are even worse.

RG

 Original Message 
On Sep 26, 2021, 13:54, Jindřich Makovička via rsync < rsync@lists.samba.org> 
wrote:
Hi,

When using rsync to back up the file system on my laptop, containing a pretty 
much default linux desktop, I was wondering how rsync uses over 100MB of RAM it 
allocates.

It turned out that most of the memory is used for the arrays of file_struct 
pointers, most of which end up unused - much more than the actual file_struct 
entries. In my case, the peak usage was 135MB of pointers, and just 1.5MB of 
the file_struct entries themselves.

The problem seems to be that the default file_list allocation parameters 
predate the incremental recursion, which allocates a huge number of small file 
lists, while AFAICS originally rsync allocated just one large list.

Applying the attached patch, which reduces the default allocation to 32 
pointers, and preallocates 32K pointers only for the main file lists in 
send_file_list and recv_file_list, reduces the peak memory usage in my case 
from 142MB to 12MB.

Regards,
--
Jindřich Makovička-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] Reduce memory usage

2021-09-28 Thread Johannes Altmanninger via rsync
Looks awesome, really nice catch!

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[PATCH] Reduce memory usage

2021-09-28 Thread Jindřich Makovička via rsync
In 2004, an allocation optimization has been added to the file
list handling code, that preallocates 32k of file_struct pointers
in a file_list. This optimization predates the incremental
recursion feature, for which it is not appropriate anymore. When
copying a tree containing a large number of small directories,
using the incremental recursion, rsync allocates many short
file_lists. Suddenly, the unused file_struct pointers can easily
take 90-95% of the memory allocated by rsync.

This can be easily reproduced by using

valgrind --tool=massif ./rsync -anx /usr /tmp/

and checking the memory profile of the first (sender) process.

This patch changes the flist functions to start only with 32 entries
for the partial file lists, instead of 32 * 1024.

It also modifies the condition for the debug notification that the
allocated memory moved, so it does not depend on the initial allocation
size. Instead, it now simply checks if the file list has been already
allocated.
---
 flist.c | 9 +++--
 rsync.h | 5 +++--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/flist.c b/flist.c
index 3442d868..530d336e 100644
--- a/flist.c
+++ b/flist.c
@@ -305,7 +305,7 @@ static void flist_expand(struct file_list *flist, int extra)
 
new_ptr = realloc_array(flist->files, struct file_struct *, 
flist->malloced);
 
-   if (DEBUG_GTE(FLIST, 1) && flist->malloced != FLIST_START) {
+   if (DEBUG_GTE(FLIST, 1) && flist->files) {
rprintf(FCLIENT, "[%s] expand file_list pointer array to %s 
bytes, did%s move\n",
who_am_i(),
big_num(sizeof flist->files[0] * flist->malloced),
@@ -2186,8 +2186,10 @@ struct file_list *send_file_list(int f, int argc, char 
*argv[])
 #endif
 
flist = cur_flist = flist_new(0, "send_file_list");
+   flist_expand(flist, FLIST_START_LARGE);
if (inc_recurse) {
dir_flist = flist_new(FLIST_TEMP, "send_file_list");
+   flist_expand(dir_flist, FLIST_START_LARGE);
flags |= FLAG_DIVERT_DIRS;
} else
dir_flist = cur_flist;
@@ -2541,10 +2543,13 @@ struct file_list *recv_file_list(int f, int dir_ndx)
 #endif
 
flist = flist_new(0, "recv_file_list");
+   flist_expand(flist, FLIST_START_LARGE);
 
if (inc_recurse) {
-   if (flist->ndx_start == 1)
+   if (flist->ndx_start == 1) {
dir_flist = flist_new(FLIST_TEMP, "recv_file_list");
+   flist_expand(dir_flist, FLIST_START_LARGE);
+   }
dstart = dir_flist->used;
} else {
dir_flist = flist;
diff --git a/rsync.h b/rsync.h
index 88319732..f8fcbffb 100644
--- a/rsync.h
+++ b/rsync.h
@@ -918,8 +918,9 @@ extern int xattrs_ndx;
  * Start the flist array at FLIST_START entries and grow it
  * by doubling until FLIST_LINEAR then grow by FLIST_LINEAR
  */
-#define FLIST_START(32 * 1024)
-#define FLIST_LINEAR   (FLIST_START * 512)
+#define FLIST_START(32)
+#define FLIST_START_LARGE  (32 * 1024)
+#define FLIST_LINEAR   (FLIST_START_LARGE * 512)
 
 /*
  * Extent size for allocation pools: A minimum size of 128KB
-- 
2.33.0


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] Reduce memory usage

2021-09-28 Thread Jindřich Makovička via rsync
On Mon, 27 Sep 2021 22:03:01 +0200
Johannes Altmanninger  wrote:

> > On Sun, Sep 26, 2021 at 01:54:13PM +0200, Jindřich Makovička via
> > rsync wrote:
> > 
> > Applying the attached patch, which reduces the default allocation
> > to 32 pointers, and preallocates 32K pointers only for the main
> > file lists in send_file_list and recv_file_list, reduces the peak
> > memory usage in my case from 142MB to 12MB.  
> 
> The patch looks very reasonable from what I can tell.
> Out of curiosity, how did you measure peak memory?
> I used "/bin/time -v rsync", and already get (very small)
> improvements in maximum RSS when copying the rsync source tree :)

Thanks for feedback. I profiled the memory usage using valgrind, simply
by running

valgrind --tool=massif ./rsync -anx /usr /tmp/

This issue affects only large trees with lots of small subdirectories,
you will not see much improvement with just a couple of directories the
rsync tree has. The outputs below are generated from the sender process
output by ms_print .

Before the fix:


Command:./rsync -anx /usr /tmp/
Massif arguments:   (none)
ms_print arguments: massif.out.199013



MB
127.6^ #  
 | #  
 | #  
 | #  
 |   @ #  
 |   @ #  
 |   @ #  
 |   @ #  
 |   @ # :
 |   @ # :
 |   @ #::
 | :   :::   @ @   #::
 | :   ::@ @   #::
 | : : ::@ @  :#   ::: :  
 | : :   : :: : :@:@:::# : ::: :  
 | :::  :: ::  :: : :@:@: :#:: :  
 |  :  ::: ::  :: : :@:@: :#  :::: :  
 |  :  ::  :: :::@:@: :#  ::@  ::: ::   ::
 |  : ::: :::  : @::  :@: :#  ::@  :::@::   @:
 |  : ::: :: @:@: @#@:@:@:@:@:
   0 +--->Gi
 0   4.574


  ntime(i) total(B)   useful-heap(B) extra-heap(B)stacks(B)
 47  2,853,846,986  133,806,904  133,270,491   536,4130
99.60% (133,270,491B) (heap allocation functions) malloc/new/new[], 
--alloc-fns, etc.
->99.12% (132,629,814B) 0x1329CC: my_alloc (in /home/henry/build/rsync/rsync)
| ->95.80% (128,188,416B) 0x11383E: flist_expand (in 
/home/henry/build/rsync/rsync)
| | ->95.80% (128,188,416B) 0x117AAB: send_file_name (in 
/home/henry/build/rsync/rsync)
| | | ->95.80% (128,188,416B) 0x1184B4: send_directory (in 
/home/henry/build/rsync/rsync)
| | | | ->95.80% (128,188,416B) 0x118CD1: send1extra (in 
/home/henry/build/rsync/rsync)
| | | |   ->95.80% (128,188,416B) 0x1190F4: send_extra_file_list (in 
/home/henry/build/rsync/rsync)
| | | | ->95.80% (128,188,416B) 0x12A17A: send_files (in 
/home/henry/build/rsync/rsync)
| | | | | ->95.80% (128,188,416B) 0x1362C5: client_run (in 
/home/henry/build/rsync/rsync)
| | | | |   ->95.80% (128,188,416B) 0x136F2D: start_client (in 
/home/henry/build/rsync/rsync)
| | | | | ->95.80% (128,188,416B) 0x137604: main (in 
/home/henry/build/rsync/rsync)
| | | | |   
| | | | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
| | | | 
| | | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
| | | 
| | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
| | 
| ->02.16% (2,883,584B) 0x16E977: pool_alloc (in /home/henry/build/rsync/rsync)
| | ->02.16% (2,883,584B) 0x116E12: make_file (in /home/henry/build/rsync/rsync)
| |   ->02.16% (2,883,584B) 0x11747F: send_file_name (in 
/home/henry/build/rsync/rsync)
| | ->02.06% (2,752,512B) 0x1184B4: send_directory (in 
/home/henry/build/rsync/rsync)
| | | ->02.06% (2,752,512B) 0x118CD1: send1extra (in 

Re: [PATCH] Reduce memory usage

2021-09-27 Thread Johannes Altmanninger via rsync
> On Sun, Sep 26, 2021 at 01:54:13PM +0200, Jindřich Makovička via rsync wrote:
> 
> Applying the attached patch, which reduces the default allocation to 32
> pointers, and preallocates 32K pointers only for the main file lists in
> send_file_list and recv_file_list, reduces the peak memory usage in my case
> from 142MB to 12MB.

The patch looks very reasonable from what I can tell.
Out of curiosity, how did you measure peak memory?
I used "/bin/time -v rsync", and already get (very small) improvements in
maximum RSS when copying the rsync source tree :)

On Mon, Sep 27, 2021 at 04:42:25PM +0200, Jindřich Makovička via rsync wrote:

> Reduce memory usage
>  
> Start only with 32 entries for the partial file lists, instead of 32k.
>

The log message could be a bit more detailed. You already mentioned that
the big 32k allocation predates the recursive version, that's very useful
information to add here.
Maybe even explain why we change the first check from "flist->malloced !=
FLIST_START" to "flist->files".
(Also I'd use "git send-email" to send patches inline but I'm not sure what's
the convention here)

> diff --git a/flist.c b/flist.c
> index 3442d868..0f7a64e6 100644
> --- a/flist.c
> +++ b/flist.c
> @@ -303,11 +303,11 @@ static void flist_expand(struct file_list *flist, int 
> extra)
>   if (flist->malloced < flist->used + extra)
>   flist->malloced = flist->used + extra;
>  
>   new_ptr = realloc_array(flist->files, struct file_struct *, 
> flist->malloced);
>  
> - if (DEBUG_GTE(FLIST, 1) && flist->malloced != FLIST_START) {
> + if (DEBUG_GTE(FLIST, 1) && flist->files) {

Yep, the new check makes more sense because now it's more obvious that the
debug message is only printed when flist->files is realloc'd, and not
when it's allocated for the first time.

>   rprintf(FCLIENT, "[%s] expand file_list pointer array to %s 
> bytes, did%s move\n",
>   who_am_i(),
>   big_num(sizeof flist->files[0] * flist->malloced),
>   (new_ptr == flist->files) ? " not" : "");
>   }
> @@ -2184,10 +2184,11 @@ struct file_list *send_file_list(int f, int argc, 
> char *argv[])
>   if (preserve_hard_links && protocol_version >= 30 && !cur_flist)
>   init_hard_links();
>  #endif
>  
>   flist = cur_flist = flist_new(0, "send_file_list");
> + flist_expand(flist, FLIST_START_LARGE);
>   if (inc_recurse) {
>   dir_flist = flist_new(FLIST_TEMP, "send_file_list");

This probably wants an flist_expand(dir_flist, FLIST_START_LARGE), because
dir_flist is a global.  I think the idea is that all flist_new() inside
loops/recursive calls should start small, but lists that are only ever
allocated once should stay at 32k.

>   flags |= FLAG_DIVERT_DIRS;
>   } else
>   dir_flist = cur_flist;
> @@ -2539,10 +2540,11 @@ struct file_list *recv_file_list(int f, int dir_ndx)
>   if (preserve_hard_links && !first_flist)
>   init_hard_links();
>  #endif
>  
>   flist = flist_new(0, "recv_file_list");
> + flist_expand(flist, FLIST_START_LARGE);
>  
>   if (inc_recurse) {
>   if (flist->ndx_start == 1)
>   dir_flist = flist_new(FLIST_TEMP, "recv_file_list");

Same here I guess. Maybe we should add an "initial size" parameter to
flist_new(), so it can call flist_expand() automatically?

>   dstart = dir_flist->used;
> diff --git a/rsync.h b/rsync.h
> index 88319732..17f8700e 100644
> --- a/rsync.h
> +++ b/rsync.h
> @@ -916,12 +916,13 @@ extern int xattrs_ndx;
>  
>  /*
>   * Start the flist array at FLIST_START entries and grow it
>   * by doubling until FLIST_LINEAR then grow by FLIST_LINEAR
>   */
> -#define FLIST_START  (32 * 1024)
> -#define FLIST_LINEAR (FLIST_START * 512)
> +#define FLIST_START  (32)
> +#define FLIST_START_LARGE(32 * 1024)
> +#define FLIST_LINEAR (FLIST_START_LARGE * 512)

Probably these should remain aligned (I'm assuming tab has width 8)

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] Reduce memory usage

2021-09-27 Thread Jindřich Makovička via rsync
On Mon, 27 Sep 2021 13:38:22 +0200
Jindřich Makovička  wrote:

> On Sun, 26 Sep 2021 13:54:13 +0200
> Jindřich Makovička  wrote:
> > Hi,
> > 
> > ...
> >
> > Applying the attached patch, which reduces the default allocation to
> > 32 pointers, and preallocates 32K pointers only for the main file
> > lists in send_file_list and recv_file_list, reduces the peak memory
> > usage in my case from 142MB to 12MB.  
> 
> The original patch breaks the testsuite due to extra messages in the
> output.

Actually it makes more sense to change the debug print to check that
the original pointer is non-null and leave the testsuite as is.

-- 
Jindrich Makovicka
>From bdfdef1c5a4437e2492da148b824d39ba235704e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jind=C5=99ich=20Makovi=C4=8Dka?= 
Date: Sun, 26 Sep 2021 12:01:21 +0200
Subject: [PATCH] Reduce memory usage

Start only with 32 entries for the partial file lists, instead of 32k.
---
 flist.c | 4 +++-
 rsync.h | 5 +++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/flist.c b/flist.c
index 3442d868..0f7a64e6 100644
--- a/flist.c
+++ b/flist.c
@@ -305,7 +305,7 @@ static void flist_expand(struct file_list *flist, int extra)
 
 	new_ptr = realloc_array(flist->files, struct file_struct *, flist->malloced);
 
-	if (DEBUG_GTE(FLIST, 1) && flist->malloced != FLIST_START) {
+	if (DEBUG_GTE(FLIST, 1) && flist->files) {
 		rprintf(FCLIENT, "[%s] expand file_list pointer array to %s bytes, did%s move\n",
 		who_am_i(),
 		big_num(sizeof flist->files[0] * flist->malloced),
@@ -2186,6 +2186,7 @@ struct file_list *send_file_list(int f, int argc, char *argv[])
 #endif
 
 	flist = cur_flist = flist_new(0, "send_file_list");
+	flist_expand(flist, FLIST_START_LARGE);
 	if (inc_recurse) {
 		dir_flist = flist_new(FLIST_TEMP, "send_file_list");
 		flags |= FLAG_DIVERT_DIRS;
@@ -2541,6 +2542,7 @@ struct file_list *recv_file_list(int f, int dir_ndx)
 #endif
 
 	flist = flist_new(0, "recv_file_list");
+	flist_expand(flist, FLIST_START_LARGE);
 
 	if (inc_recurse) {
 		if (flist->ndx_start == 1)
diff --git a/rsync.h b/rsync.h
index 88319732..17f8700e 100644
--- a/rsync.h
+++ b/rsync.h
@@ -918,8 +918,9 @@ extern int xattrs_ndx;
  * Start the flist array at FLIST_START entries and grow it
  * by doubling until FLIST_LINEAR then grow by FLIST_LINEAR
  */
-#define FLIST_START	(32 * 1024)
-#define FLIST_LINEAR	(FLIST_START * 512)
+#define FLIST_START	(32)
+#define FLIST_START_LARGE	(32 * 1024)
+#define FLIST_LINEAR	(FLIST_START_LARGE * 512)
 
 /*
  * Extent size for allocation pools: A minimum size of 128KB
-- 
2.33.0

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] Reduce memory usage

2021-09-27 Thread Jindřich Makovička via rsync
On Sun, 26 Sep 2021 13:54:13 +0200
Jindřich Makovička  wrote:
> Hi,
> 
> ...
>
> Applying the attached patch, which reduces the default allocation to
> 32 pointers, and preallocates 32K pointers only for the main file
> lists in send_file_list and recv_file_list, reduces the peak memory
> usage in my case from 142MB to 12MB.

The original patch breaks the testsuite due to extra messages in the
output.

Fix attached.

Regards,
-- 
Jindrich Makovicka
>From aa907eabe701550cd5649b6e3da2b22b79b38a06 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jind=C5=99ich=20Makovi=C4=8Dka?= 
Date: Sun, 26 Sep 2021 12:01:21 +0200
Subject: [PATCH] Reduce memory usage

Start only with 32 entries for the partial file lists, instead of 32k.
---
 flist.c | 2 ++
 rsync.h | 5 +++--
 testsuite/rsync.fns | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/flist.c b/flist.c
index 3442d868..37f70b69 100644
--- a/flist.c
+++ b/flist.c
@@ -2186,6 +2186,7 @@ struct file_list *send_file_list(int f, int argc, char *argv[])
 #endif
 
 	flist = cur_flist = flist_new(0, "send_file_list");
+	flist_expand(flist, FLIST_START_LARGE);
 	if (inc_recurse) {
 		dir_flist = flist_new(FLIST_TEMP, "send_file_list");
 		flags |= FLAG_DIVERT_DIRS;
@@ -2541,6 +2542,7 @@ struct file_list *recv_file_list(int f, int dir_ndx)
 #endif
 
 	flist = flist_new(0, "recv_file_list");
+	flist_expand(flist, FLIST_START_LARGE);
 
 	if (inc_recurse) {
 		if (flist->ndx_start == 1)
diff --git a/rsync.h b/rsync.h
index 88319732..17f8700e 100644
--- a/rsync.h
+++ b/rsync.h
@@ -918,8 +918,9 @@ extern int xattrs_ndx;
  * Start the flist array at FLIST_START entries and grow it
  * by doubling until FLIST_LINEAR then grow by FLIST_LINEAR
  */
-#define FLIST_START	(32 * 1024)
-#define FLIST_LINEAR	(FLIST_START * 512)
+#define FLIST_START	(32)
+#define FLIST_START_LARGE	(32 * 1024)
+#define FLIST_LINEAR	(FLIST_START_LARGE * 512)
 
 /*
  * Extent size for allocation pools: A minimum size of 128KB
diff --git a/testsuite/rsync.fns b/testsuite/rsync.fns
index 1e2b399f..220b9e21 100644
--- a/testsuite/rsync.fns
+++ b/testsuite/rsync.fns
@@ -89,6 +89,7 @@ v_filt() {
 	-e '/^total: /d' \
 	-e '/^client charset: /d' \
 	-e '/^server charset: /d' \
+	-e '/ expand file_list pointer array /d' \
 	-e '/^$/,$d'
 }
 
-- 
2.33.0

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[PATCH] Reduce memory usage

2021-09-26 Thread Jindřich Makovička via rsync
Hi,

When using rsync to back up the file system on my laptop, containing a
pretty much default linux desktop, I was wondering how rsync uses over
100MB of RAM it allocates.

It turned out that most of the memory is used for the arrays of file_struct
pointers, most of which end up unused - much more than the actual
file_struct entries. In my case, the peak usage was 135MB of pointers, and
just 1.5MB of the file_struct entries themselves.

The problem seems to be that the default file_list allocation parameters
predate the incremental recursion, which allocates a huge number of small
file lists, while AFAICS originally rsync allocated just one large list.

Applying the attached patch, which reduces the default allocation to 32
pointers, and preallocates 32K pointers only for the main file lists in
send_file_list and recv_file_list, reduces the peak memory usage in my case
from 142MB to 12MB.

Regards,
--
Jindřich Makovička
From ef169c9157d312c63bad00e3bfc1d8eb70d56ccd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jind=C5=99ich=20Makovi=C4=8Dka?= 
Date: Sun, 26 Sep 2021 12:01:21 +0200
Subject: [PATCH] Reduce memory usage

Start only with 32 entries for the partial file lists, instead of 32k.
---
 flist.c | 2 ++
 rsync.h | 5 +++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/flist.c b/flist.c
index 3442d868..37f70b69 100644
--- a/flist.c
+++ b/flist.c
@@ -2186,6 +2186,7 @@ struct file_list *send_file_list(int f, int argc, char *argv[])
 #endif
 
 	flist = cur_flist = flist_new(0, "send_file_list");
+	flist_expand(flist, FLIST_START_LARGE);
 	if (inc_recurse) {
 		dir_flist = flist_new(FLIST_TEMP, "send_file_list");
 		flags |= FLAG_DIVERT_DIRS;
@@ -2541,6 +2542,7 @@ struct file_list *recv_file_list(int f, int dir_ndx)
 #endif
 
 	flist = flist_new(0, "recv_file_list");
+	flist_expand(flist, FLIST_START_LARGE);
 
 	if (inc_recurse) {
 		if (flist->ndx_start == 1)
diff --git a/rsync.h b/rsync.h
index 2f674bc5..708fd244 100644
--- a/rsync.h
+++ b/rsync.h
@@ -917,8 +917,9 @@ extern int xattrs_ndx;
  * Start the flist array at FLIST_START entries and grow it
  * by doubling until FLIST_LINEAR then grow by FLIST_LINEAR
  */
-#define FLIST_START	(32 * 1024)
-#define FLIST_LINEAR	(FLIST_START * 512)
+#define FLIST_START	(32)
+#define FLIST_START_LARGE	(32 * 1024)
+#define FLIST_LINEAR	(FLIST_START_LARGE * 512)
 
 /*
  * Extent size for allocation pools: A minimum size of 128KB
-- 
2.33.0

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html