Re: Is there a better way to transfer data that doesn't use so much cache?

2022-08-08 Thread Rob Campbell via rsync
I've decided to rewrite the script and use cp and mv rather than rsync.  In
the past, I've had some lost data using just cp and mv which is why I moved
to rsync to put the data into a staging directory.  Now that I've been
creating more data (newer cameras with higher megapixel files and more
files), rsync doesn't work as well as it used to.  Trying to get nocache or
something similar to work seemed like it would take more time than to
rewrite the script.

Thanks all for your assistance and suggestions.

~
In all things, Be Intentional.


On Fri, Aug 5, 2022 at 1:22 AM Wayne Davison via rsync <
rsync@lists.samba.org> wrote:

> On Wed, Aug 3, 2022 at 7:10 PM Dan Stromberg wrote:
>
>> However, if you transfer a large amount of data and do not intend to
>> retransmit that data any time soon, then the memory isn't really put to
>> good use, and can actually cause your system to slow down significantly -
>> particularly if there's a lot of such data transferred.
>>
>
> I have always rejected overcomplicating rsync with cache control code (the
> complexity of a --drop-cache patch I saw was quite horrifying).  In the
> past I pointed people towards https://github.com/Feh/nocache as one way
> to get posix_fadvise used by an rsync copy.  That project now apparently
> suggests creating a memory-bounded cgroup, which sounds interesting.
>
> ..wayne..
> --
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
>
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Is there a better way to transfer data that doesn't use so much cache?

2022-08-04 Thread Wayne Davison via rsync
On Wed, Aug 3, 2022 at 7:10 PM Dan Stromberg wrote:

> However, if you transfer a large amount of data and do not intend to
> retransmit that data any time soon, then the memory isn't really put to
> good use, and can actually cause your system to slow down significantly -
> particularly if there's a lot of such data transferred.
>

I have always rejected overcomplicating rsync with cache control code (the
complexity of a --drop-cache patch I saw was quite horrifying).  In the
past I pointed people towards https://github.com/Feh/nocache as one way to
get posix_fadvise used by an rsync copy.  That project now apparently
suggests creating a memory-bounded cgroup, which sounds interesting.

..wayne..
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Is there a better way to transfer data that doesn't use so much cache?

2022-08-03 Thread Dan Stromberg via rsync
On Wed, Aug 3, 2022 at 5:41 PM Robin Lee Powell via rsync <
rsync@lists.samba.org> wrote:

> On Wed, Aug 03, 2022 at 02:04:22PM -0400, Rob Campbell via rsync wrote:
> > The problem isn't that there are many syncs because the problem happens
> on
> > the first one that runs.
>
> You didn't actually say what the problem *is*.
>
> I can infer from the subject that you think it's bad that rsync is
> using a bunch of disk/buffer cache, but that's not rsync, that's
> Linux, and it's by design; Linux uses as much RAM as it possibly can
> for disk cache, always.  This improves performance.  In a
> well-performing Linux system, the "free" column of "free -h" is very
> low, and the "available" column is very high.
>

Linux does indeed try to put your RAM to good use, and often that means
caching data from disk in RAM.

However, if you transfer a large amount of data and do not intend to
retransmit that data any time soon, then the memory isn't really put to
good use, and can actually cause your system to slow down significantly -
particularly if there's a lot of such data transferred.

It is, however, theoretically possible to skip the buffer cache using
O_DIRECT, but that requires your application to have O_DIRECT support, or
to use something like https://stromberg.dnsalias.org/~strombrg/libodirect/

HTH.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Is there a better way to transfer data that doesn't use so much cache?

2022-08-03 Thread Robin Lee Powell via rsync



On Wed, Aug 03, 2022 at 02:04:22PM -0400, Rob Campbell via rsync wrote:
> I've created a script that syncs (and removes) data from as many as 4
> places and puts them all in one of 2 directories.  The commands are:
> 
> rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.nef'
> -f'+ *.jp*g' -f'+ *.tif' -f'+ *.xmp' -f'+ /*' -f'- *'
> "$D850/DCIM/100ND850/" $STAGINGP/ | tee -a $LOG
> 
> rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.nef'
> -f'+ *.jp*g' -f'+ *.tif' -f'+ *.xmp' -f'+ /*' -f'- *' "$Z9/DCIM/100NCZ_9/"
> $STAGINGP/ | tee -a $LOG
> 
> rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4'
> -f'+ /*' -f'- *' "$DASHCAM/CARDV/VIDEO/" $STAGINGV/ | tee -a $LOG
> 
> rsync -avt --progress --remove-source-files --info=progress2 -f'-
> Screenshots/' -f'+ *.nef' -f'+ *.jpg' -f'+ *.jp*g' -f'+ *.png' -f'+ *.dng'
> -f'+ *.gif' -f'- *.thumbnails' -f'- *.android' -f'+ */' -f'+ DCIM/*' -f'+
> Snapbridge/*' -f'+ Pictures/*' -f'+ Download/*' -f'+ Textgram/*' -f'- *'
> $PHONE/ $STAGINGP/ | tee -a $LOG
> 
> rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4'
> -f'+ *.avi' -f'+ *.mov' -f'+ *.mp*g' -f'+ *.3gp' -f'- *'
> --files-from=<(find $PHONE -type f ! -path "*Download*" ! -path
> "*.trashed*" ! -iname .mp4 ! -iname
> '*.mp4\.*')/ $STAGINGV/ | tee -a $LOG
> 
> rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4'
> -f'+ *.avi' -f'+ *.mov' -f'+ *.mp*g' -f'+ *.3gp' -f'+ Movies/*' -f'+
> *Recordings/*' -f'+ DCIM/*' -f'+ Snapbridge/*' -f'- */' -f'- *' $PHONE/
> $STAGINGV/ | tee -a $LOG
> 
> rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4'
> -f'- *' --files-from=<(find $PHONE -iname
> .mp4) / $STAGINGV/TIKTOK/ | tee -a $LOG
> 
> rsync -avt --progress --remove-source-files --info=progress2 -f'+ *'
> $PHONE/Downloads/ $COMPUTER/Downloads/
> 
> rsync -avt --progress --remove-source-files --info=progress2 -f'-
> screenshot*' -'f- Screenshot*' -f'- Boondocks/' -f'- Dilbert/' -f'+ *.png'
> -f'+ *.jp*g' -f'+ *.dng' -f'+ *.gif' -f'- *20*/' -f'- *' -f'+ */' -f'-
> $STAGINGP/' $MYPICS/ $STAGINGP/ | tee -a $LOG
> 
> rsync -avt --progress --remove-source-files --info=progress2 -f'+
> Screenshot*.png' -f'- Staging/' -f'- *' $MYPICS/ $STAGINGP/Screenshots/ |
> tee -a $LOG
> 
> rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.3gpp'
> -f'+ *.mp4' -f'+ *.mp*g' -f'+ *.avi' -f'+ *.asf' -f'+ *.wmv' -f'- *'
> $HOME/Downloads $STAGINGV/ | tee -a $LOG
> 
> rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4'
> -f'+ *.mp*g' -f'+ *.avi' -f'+ *.asf' -f'+ *.wmv' -f'+ *.3gpp' -f'- *'
> $MYVIDEOS/ $STAGINGV/ | tee -a $LOG
> 
> 
> The problem isn't that there are many syncs because the problem happens on
> the first one that runs.

You didn't actually say what the problem *is*.

I can infer from the subject that you think it's bad that rsync is
using a bunch of disk/buffer cache, but that's not rsync, that's
Linux, and it's by design; Linux uses as much RAM as it possibly can
for disk cache, always.  This improves performance.  In a
well-performing Linux system, the "free" column of "free -h" is very
low, and the "available" column is very high.

> Before any of them run I run:
> 
> sudo free -w -h;sync && echo 1 > /proc/sys/vm/drop_caches;free -w -h
> 
> I do not run this before each one because it sometimes takes a while to
> /proc/sys/vm/drop_caches

That's a great way to substantially reduce performance; why are you
doing that?

> Is there something in the logic that can be done to make this perform
> better or should I use something other than rsync or is what I am getting
> as good as it will get regardless of what I use?
> 
> Some of these directories can be over a gig.  Most of these are media files
> and should have exif data that has the timestamp so maybe I can get rid of
> -t but it is easier to keep the timestamp of the file rather than running
> exiftool to also use the create date to "touch" the file but maybe using
> exiftool is a faster way?
> 
> ~
> In all things, Be Intentional.

> -- 
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Is there a better way to transfer data that doesn't use so much cache?

2022-08-03 Thread Rob Campbell via rsync
I've created a script that syncs (and removes) data from as many as 4
places and puts them all in one of 2 directories.  The commands are:

rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.nef'
-f'+ *.jp*g' -f'+ *.tif' -f'+ *.xmp' -f'+ /*' -f'- *'
"$D850/DCIM/100ND850/" $STAGINGP/ | tee -a $LOG

rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.nef'
-f'+ *.jp*g' -f'+ *.tif' -f'+ *.xmp' -f'+ /*' -f'- *' "$Z9/DCIM/100NCZ_9/"
$STAGINGP/ | tee -a $LOG

rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4'
-f'+ /*' -f'- *' "$DASHCAM/CARDV/VIDEO/" $STAGINGV/ | tee -a $LOG

rsync -avt --progress --remove-source-files --info=progress2 -f'-
Screenshots/' -f'+ *.nef' -f'+ *.jpg' -f'+ *.jp*g' -f'+ *.png' -f'+ *.dng'
-f'+ *.gif' -f'- *.thumbnails' -f'- *.android' -f'+ */' -f'+ DCIM/*' -f'+
Snapbridge/*' -f'+ Pictures/*' -f'+ Download/*' -f'+ Textgram/*' -f'- *'
$PHONE/ $STAGINGP/ | tee -a $LOG

rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4'
-f'+ *.avi' -f'+ *.mov' -f'+ *.mp*g' -f'+ *.3gp' -f'- *'
--files-from=<(find $PHONE -type f ! -path "*Download*" ! -path
"*.trashed*" ! -iname .mp4 ! -iname
'*.mp4\.*')/ $STAGINGV/ | tee -a $LOG

rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4'
-f'+ *.avi' -f'+ *.mov' -f'+ *.mp*g' -f'+ *.3gp' -f'+ Movies/*' -f'+
*Recordings/*' -f'+ DCIM/*' -f'+ Snapbridge/*' -f'- */' -f'- *' $PHONE/
$STAGINGV/ | tee -a $LOG

rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4'
-f'- *' --files-from=<(find $PHONE -iname
.mp4) / $STAGINGV/TIKTOK/ | tee -a $LOG

rsync -avt --progress --remove-source-files --info=progress2 -f'+ *'
$PHONE/Downloads/ $COMPUTER/Downloads/

rsync -avt --progress --remove-source-files --info=progress2 -f'-
screenshot*' -'f- Screenshot*' -f'- Boondocks/' -f'- Dilbert/' -f'+ *.png'
-f'+ *.jp*g' -f'+ *.dng' -f'+ *.gif' -f'- *20*/' -f'- *' -f'+ */' -f'-
$STAGINGP/' $MYPICS/ $STAGINGP/ | tee -a $LOG

rsync -avt --progress --remove-source-files --info=progress2 -f'+
Screenshot*.png' -f'- Staging/' -f'- *' $MYPICS/ $STAGINGP/Screenshots/ |
tee -a $LOG

rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.3gpp'
-f'+ *.mp4' -f'+ *.mp*g' -f'+ *.avi' -f'+ *.asf' -f'+ *.wmv' -f'- *'
$HOME/Downloads $STAGINGV/ | tee -a $LOG

rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4'
-f'+ *.mp*g' -f'+ *.avi' -f'+ *.asf' -f'+ *.wmv' -f'+ *.3gpp' -f'- *'
$MYVIDEOS/ $STAGINGV/ | tee -a $LOG


The problem isn't that there are many syncs because the problem happens on
the first one that runs.  Before any of them run I run:

sudo free -w -h;sync && echo 1 > /proc/sys/vm/drop_caches;free -w -h

I do not run this before each one because it sometimes takes a while to
/proc/sys/vm/drop_caches

Is there something in the logic that can be done to make this perform
better or should I use something other than rsync or is what I am getting
as good as it will get regardless of what I use?

Some of these directories can be over a gig.  Most of these are media files
and should have exif data that has the timestamp so maybe I can get rid of
-t but it is easier to keep the timestamp of the file rather than running
exiftool to also use the create date to "touch" the file but maybe using
exiftool is a faster way?

~
In all things, Be Intentional.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html