I had to try a dozen backed up versions of a file one after another to find a 
lost edit and I was also greatly annoyed by the delay (~6 minutes with a fast 
USB3 disk). So I noticed this report and also noted that  duplicity indeed does 
many tenths of thousands to half a million sys-calls to stat or open 
/etc/localtime, /etc/group and /etc/passwd. That seems a bit unreasonable to me 
but I'm NOT AN EXPERT AT ALL. The only thing I can tell for sure is that 6 
minutes per restore is some times toooooo long*. In my case it was painful.
 
To test I started the restore of one small file from the GUI and then run 
strace as follows on various python processes that I was noticing with `ps 
ax|grep python`. Before the final duplicity python process started I recorded 
about half a million stat("/etc/localtime") syscalls (you may be interested in 
[1] regarding this behaviour). I then recorded ~100,000 file related syscalls 
to open /etc/passwd and /etc/group. Here are my commands:

# strace  -e trace=file -p 3827  2> /tmp/foo
...
# wc -l /tmp/foo                                                                
    
100892 /tmp/lala4         
# sort /tmp/foo|sed -e 's/", .*/"/'|uniq -c|sort -n|tail  -2
  49912 open("/etc/group"
  49912 open("/etc/passwd"

What seems interesting are the duplicity processes I show starting. I
noticed the same commands running more than once and this doesn't seem
normal at all. Especially the fact that duplicity list-current-files
runs twice with exactly the same options and takes ~2 minutes each time
seems troubling**.

  1  6855 0:24 python2 /usr/bin/duplicity collection-status --gio 
file:///bak/ndbak2 --no-encryption --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=19               
                                       
  2  7686 0:23 python2 /usr/bin/duplicity collection-status --gio 
file:///bak/ndbak2 --no-encryption --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=19               
                                       
  3  8110 2:26 python2 /usr/bin/duplicity list-current-files --gio 
file:///bak/ndbak2 --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=19               
                                                      
  4 10886 0:21 python2 /usr/bin/duplicity restore --gio 
--file-to-restore=home/nd/test --time=2018-06-06T09:23:45Z --force 
file:///bak/ndbak2 /home/nd/test --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=1
    9                                                                           
                                                                                
                                                                                
                       
  5 11241 0:21 python2 /usr/bin/duplicity collection-status --gio 
file:///bak/ndbak2 --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=21               
                                                       
  6 11535 2:03 python2 /usr/bin/duplicity list-current-files --gio 
file:///bak/ndbak2 --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=20               
                                                      
  7 13659 0:29 python2 /usr/bin/duplicity restore --gio 
--file-to-restore=home/nd/test --time=2018-06-06T09:23:45Z --force 
file:///bak/ndbak2 /home/nd/test --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=2
    0         

Here's the output of strace  -c -f -p 11535 (duplicity list-current-
files) where you can see that 34.68% of syscalls time is spent on stat
(I _guess_ on stat('/etc/timezone'). Roughly the same times I show on
the other invocation of duplicity list-current-files.

strace: Process 11535 attached with 3 threads
strace: [ Process PID=11535 runs in x32 mode. ]
strace: [ Process PID=11535 runs in 64 bit mode. ]
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 48.67    0.243360           0    632709           write
 34.68    0.173416           1    316355           stat
 15.26    0.076308           1    128272           read
  0.78    0.003885        3885         1           brk
  0.58    0.002923           3       952           munmap
  0.01    0.000046          23         2           unlink
  0.01    0.000039          39         1           mkdir
  0.00    0.000023          23         1           rmdir
  0.00    0.000003           3         1           fstat
  0.00    0.000003           3         1           rt_sigaction
  0.00    0.000000           0         2           close
  0.00    0.000000           0         2           lseek
------ ----------- ----------- --------- --------- ----------------
100.00    0.500006               1078299           total


______________

[1] https://blog.packagecloud.io/eng/2017/02/21/set-environment-
variable-save-thousands-of-system-calls/

_______________
* But don't get me wrong -- I'm still really thankful for all the free fish :-)
** just guessing but this might be a glitch in the way the GUI calls duplicity

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to duplicity in Ubuntu.
https://bugs.launchpad.net/bugs/1090725

Title:
  slow scanning because too many times opens /etc/groups and /etc/passwd

Status in duplicity package in Ubuntu:
  Expired

Bug description:
  Scaning for which files need to backup is a little bit slow.
  May this is because 'deja-dup' too many times read /etc/groups (which remains 
unchanged all time programm works).

  Steps to reproduce:

  1. Run
    $ deja-dup --backup
    'Back Up' window is appears

  2. Wait for 'Scanning...' stage (when in log appears lot of 'Scanning:
  ....' lines.

  3. Get PID of '/usr/bin/python /usr/bin/duplicity'  process  and  run
  strace -p <PID>

  
  Actual result:

  A lot of sequences like this:

  <pre>
  open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 18
  _llseek(18, 0, [0], SEEK_CUR)           = 0
  fstat64(18, {st_mode=S_IFREG|0644, st_size=1983, ...}) = 0
  mmap2(NULL, 1983, PROT_READ, MAP_SHARED, 18, 0) = 0xb772f000
  _llseek(18, 1983, [1983], SEEK_SET)     = 0
  munmap(0xb772f000, 1983)                = 0
  close(18)                               = 0
  open("/etc/group", O_RDONLY|O_CLOEXEC)  = 18
  _llseek(18, 0, [0], SEEK_CUR)           = 0
  fstat64(18, {st_mode=S_IFREG|0644, st_size=1019, ...}) = 0
  mmap2(NULL, 1019, PROT_READ, MAP_SHARED, 18, 0) = 0xb772f000
  _llseek(18, 1019, [1019], SEEK_SET)     = 0
  munmap(0xb772f000, 1019)                = 0
  close(18)    
  </pre>


  alex@hp-635:~$ grep 'open' duplicity_strace.log | wc -l
  2101
  alex@hp-635:~$ grep 'open' duplicity_strace.log | grep -v '/etc/' | wc -l
  49


  Expected result:

  Do not read /etc/passwd and /etc/groups again and again- it doesn't
  changes in time when programm works.

  ProblemType: Bug
  DistroRelease: Ubuntu 12.04
  Package: deja-dup 22.0-0ubuntu2
  ProcVersionSignature: Ubuntu 3.2.0-35.55-generic 3.2.34
  Uname: Linux 3.2.0-35-generic i686
  NonfreeKernelModules: fglrx
  ApportVersion: 2.0.1-0ubuntu17
  Architecture: i386
  Date: Sat Dec 15 21:02:58 2012
  InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release i386 
(20120423)
  MarkForUpload: True
  SourcePackage: deja-dup
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/duplicity/+bug/1090725/+subscriptions

-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to