Re: Is there a better way to transfer data that doesn't use so much cache?
I've decided to rewrite the script and use cp and mv rather than rsync. In the past, I've had some lost data using just cp and mv which is why I moved to rsync to put the data into a staging directory. Now that I've been creating more data (newer cameras with higher megapixel files and more files), rsync doesn't work as well as it used to. Trying to get nocache or something similar to work seemed like it would take more time than to rewrite the script. Thanks all for your assistance and suggestions. ~ In all things, Be Intentional. On Fri, Aug 5, 2022 at 1:22 AM Wayne Davison via rsync < rsync@lists.samba.org> wrote: > On Wed, Aug 3, 2022 at 7:10 PM Dan Stromberg wrote: > >> However, if you transfer a large amount of data and do not intend to >> retransmit that data any time soon, then the memory isn't really put to >> good use, and can actually cause your system to slow down significantly - >> particularly if there's a lot of such data transferred. >> > > I have always rejected overcomplicating rsync with cache control code (the > complexity of a --drop-cache patch I saw was quite horrifying). In the > past I pointed people towards https://github.com/Feh/nocache as one way > to get posix_fadvise used by an rsync copy. That project now apparently > suggests creating a memory-bounded cgroup, which sounds interesting. > > ..wayne.. > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: > https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html > -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Is there a better way to transfer data that doesn't use so much cache?
On Wed, Aug 3, 2022 at 7:10 PM Dan Stromberg wrote: > However, if you transfer a large amount of data and do not intend to > retransmit that data any time soon, then the memory isn't really put to > good use, and can actually cause your system to slow down significantly - > particularly if there's a lot of such data transferred. > I have always rejected overcomplicating rsync with cache control code (the complexity of a --drop-cache patch I saw was quite horrifying). In the past I pointed people towards https://github.com/Feh/nocache as one way to get posix_fadvise used by an rsync copy. That project now apparently suggests creating a memory-bounded cgroup, which sounds interesting. ..wayne.. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Is there a better way to transfer data that doesn't use so much cache?
On Wed, Aug 3, 2022 at 5:41 PM Robin Lee Powell via rsync < rsync@lists.samba.org> wrote: > On Wed, Aug 03, 2022 at 02:04:22PM -0400, Rob Campbell via rsync wrote: > > The problem isn't that there are many syncs because the problem happens > on > > the first one that runs. > > You didn't actually say what the problem *is*. > > I can infer from the subject that you think it's bad that rsync is > using a bunch of disk/buffer cache, but that's not rsync, that's > Linux, and it's by design; Linux uses as much RAM as it possibly can > for disk cache, always. This improves performance. In a > well-performing Linux system, the "free" column of "free -h" is very > low, and the "available" column is very high. > Linux does indeed try to put your RAM to good use, and often that means caching data from disk in RAM. However, if you transfer a large amount of data and do not intend to retransmit that data any time soon, then the memory isn't really put to good use, and can actually cause your system to slow down significantly - particularly if there's a lot of such data transferred. It is, however, theoretically possible to skip the buffer cache using O_DIRECT, but that requires your application to have O_DIRECT support, or to use something like https://stromberg.dnsalias.org/~strombrg/libodirect/ HTH. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Is there a better way to transfer data that doesn't use so much cache?
On Wed, Aug 03, 2022 at 02:04:22PM -0400, Rob Campbell via rsync wrote: > I've created a script that syncs (and removes) data from as many as 4 > places and puts them all in one of 2 directories. The commands are: > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.nef' > -f'+ *.jp*g' -f'+ *.tif' -f'+ *.xmp' -f'+ /*' -f'- *' > "$D850/DCIM/100ND850/" $STAGINGP/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.nef' > -f'+ *.jp*g' -f'+ *.tif' -f'+ *.xmp' -f'+ /*' -f'- *' "$Z9/DCIM/100NCZ_9/" > $STAGINGP/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' > -f'+ /*' -f'- *' "$DASHCAM/CARDV/VIDEO/" $STAGINGV/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'- > Screenshots/' -f'+ *.nef' -f'+ *.jpg' -f'+ *.jp*g' -f'+ *.png' -f'+ *.dng' > -f'+ *.gif' -f'- *.thumbnails' -f'- *.android' -f'+ */' -f'+ DCIM/*' -f'+ > Snapbridge/*' -f'+ Pictures/*' -f'+ Download/*' -f'+ Textgram/*' -f'- *' > $PHONE/ $STAGINGP/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' > -f'+ *.avi' -f'+ *.mov' -f'+ *.mp*g' -f'+ *.3gp' -f'- *' > --files-from=<(find $PHONE -type f ! -path "*Download*" ! -path > "*.trashed*" ! -iname .mp4 ! -iname > '*.mp4\.*')/ $STAGINGV/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' > -f'+ *.avi' -f'+ *.mov' -f'+ *.mp*g' -f'+ *.3gp' -f'+ Movies/*' -f'+ > *Recordings/*' -f'+ DCIM/*' -f'+ Snapbridge/*' -f'- */' -f'- *' $PHONE/ > $STAGINGV/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' > -f'- *' --files-from=<(find $PHONE -iname > .mp4) / $STAGINGV/TIKTOK/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *' > $PHONE/Downloads/ $COMPUTER/Downloads/ > > rsync -avt --progress --remove-source-files --info=progress2 -f'- > screenshot*' -'f- Screenshot*' -f'- Boondocks/' -f'- Dilbert/' -f'+ *.png' > -f'+ *.jp*g' -f'+ *.dng' -f'+ *.gif' -f'- *20*/' -f'- *' -f'+ */' -f'- > $STAGINGP/' $MYPICS/ $STAGINGP/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ > Screenshot*.png' -f'- Staging/' -f'- *' $MYPICS/ $STAGINGP/Screenshots/ | > tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.3gpp' > -f'+ *.mp4' -f'+ *.mp*g' -f'+ *.avi' -f'+ *.asf' -f'+ *.wmv' -f'- *' > $HOME/Downloads $STAGINGV/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' > -f'+ *.mp*g' -f'+ *.avi' -f'+ *.asf' -f'+ *.wmv' -f'+ *.3gpp' -f'- *' > $MYVIDEOS/ $STAGINGV/ | tee -a $LOG > > > The problem isn't that there are many syncs because the problem happens on > the first one that runs. You didn't actually say what the problem *is*. I can infer from the subject that you think it's bad that rsync is using a bunch of disk/buffer cache, but that's not rsync, that's Linux, and it's by design; Linux uses as much RAM as it possibly can for disk cache, always. This improves performance. In a well-performing Linux system, the "free" column of "free -h" is very low, and the "available" column is very high. > Before any of them run I run: > > sudo free -w -h;sync && echo 1 > /proc/sys/vm/drop_caches;free -w -h > > I do not run this before each one because it sometimes takes a while to > /proc/sys/vm/drop_caches That's a great way to substantially reduce performance; why are you doing that? > Is there something in the logic that can be done to make this perform > better or should I use something other than rsync or is what I am getting > as good as it will get regardless of what I use? > > Some of these directories can be over a gig. Most of these are media files > and should have exif data that has the timestamp so maybe I can get rid of > -t but it is easier to keep the timestamp of the file rather than running > exiftool to also use the create date to "touch" the file but maybe using > exiftool is a faster way? > > ~ > In all things, Be Intentional. > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: > https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Is there a better way to transfer data that doesn't use so much cache?
I've created a script that syncs (and removes) data from as many as 4 places and puts them all in one of 2 directories. The commands are: rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.nef' -f'+ *.jp*g' -f'+ *.tif' -f'+ *.xmp' -f'+ /*' -f'- *' "$D850/DCIM/100ND850/" $STAGINGP/ | tee -a $LOG rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.nef' -f'+ *.jp*g' -f'+ *.tif' -f'+ *.xmp' -f'+ /*' -f'- *' "$Z9/DCIM/100NCZ_9/" $STAGINGP/ | tee -a $LOG rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' -f'+ /*' -f'- *' "$DASHCAM/CARDV/VIDEO/" $STAGINGV/ | tee -a $LOG rsync -avt --progress --remove-source-files --info=progress2 -f'- Screenshots/' -f'+ *.nef' -f'+ *.jpg' -f'+ *.jp*g' -f'+ *.png' -f'+ *.dng' -f'+ *.gif' -f'- *.thumbnails' -f'- *.android' -f'+ */' -f'+ DCIM/*' -f'+ Snapbridge/*' -f'+ Pictures/*' -f'+ Download/*' -f'+ Textgram/*' -f'- *' $PHONE/ $STAGINGP/ | tee -a $LOG rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' -f'+ *.avi' -f'+ *.mov' -f'+ *.mp*g' -f'+ *.3gp' -f'- *' --files-from=<(find $PHONE -type f ! -path "*Download*" ! -path "*.trashed*" ! -iname .mp4 ! -iname '*.mp4\.*')/ $STAGINGV/ | tee -a $LOG rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' -f'+ *.avi' -f'+ *.mov' -f'+ *.mp*g' -f'+ *.3gp' -f'+ Movies/*' -f'+ *Recordings/*' -f'+ DCIM/*' -f'+ Snapbridge/*' -f'- */' -f'- *' $PHONE/ $STAGINGV/ | tee -a $LOG rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' -f'- *' --files-from=<(find $PHONE -iname .mp4) / $STAGINGV/TIKTOK/ | tee -a $LOG rsync -avt --progress --remove-source-files --info=progress2 -f'+ *' $PHONE/Downloads/ $COMPUTER/Downloads/ rsync -avt --progress --remove-source-files --info=progress2 -f'- screenshot*' -'f- Screenshot*' -f'- Boondocks/' -f'- Dilbert/' -f'+ *.png' -f'+ *.jp*g' -f'+ *.dng' -f'+ *.gif' -f'- *20*/' -f'- *' -f'+ */' -f'- $STAGINGP/' $MYPICS/ $STAGINGP/ | tee -a $LOG rsync -avt --progress --remove-source-files --info=progress2 -f'+ Screenshot*.png' -f'- Staging/' -f'- *' $MYPICS/ $STAGINGP/Screenshots/ | tee -a $LOG rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.3gpp' -f'+ *.mp4' -f'+ *.mp*g' -f'+ *.avi' -f'+ *.asf' -f'+ *.wmv' -f'- *' $HOME/Downloads $STAGINGV/ | tee -a $LOG rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' -f'+ *.mp*g' -f'+ *.avi' -f'+ *.asf' -f'+ *.wmv' -f'+ *.3gpp' -f'- *' $MYVIDEOS/ $STAGINGV/ | tee -a $LOG The problem isn't that there are many syncs because the problem happens on the first one that runs. Before any of them run I run: sudo free -w -h;sync && echo 1 > /proc/sys/vm/drop_caches;free -w -h I do not run this before each one because it sometimes takes a while to /proc/sys/vm/drop_caches Is there something in the logic that can be done to make this perform better or should I use something other than rsync or is what I am getting as good as it will get regardless of what I use? Some of these directories can be over a gig. Most of these are media files and should have exif data that has the timestamp so maybe I can get rid of -t but it is easier to keep the timestamp of the file rather than running exiftool to also use the create date to "touch" the file but maybe using exiftool is a faster way? ~ In all things, Be Intentional. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html