Hi all,

for improved stat() performance the Lustre filesystem uses entirely empty 
sparse files on its metadata target (MDT). Now with hundredes of millions of 
sparse file of huge sizes, creating a backup of of the MDT using vanilla 
gnu-tar is basically impossible, as it needs far too much time to detect 
sparse files.

Attached is a patch to improve sparse file detection, if the entire file is 
sparse. All credits go to my colleague Kit Westneat as it was his idea.

Example: Directory with a single 4 TiB  sparse file:

default-tar:

be...@rhel5@bathl:/tmpa/tests$ time tar cvfS sparse.tar sparse
sparse/
^C

real    0m14.710s
user    0m7.100s
sys     0m7.450s

(I aborted with ctrl-c, tar was running at 99% CPU time)

Improved tar:

be...@rhel5@bathl:/tmpa/tests$ time 
/tmpa/devel/Lustre/tar/tar-1.22-13.el6/tar-1.22/src/tar cvfS sparse.tar sparse
sparse/
sparse/ost2.img

real    0m0.006s
user    0m0.000s
sys     0m0.000s


The patch still has a TODO comment, as tar_sparse_scan(file, ...) may call
file->optab->scan_block() and simply did not figure out yet where the 
scan_block() method is assigned (if at all, dead code?).

Any comment is appreciated.


Thanks,
Bernd

PS: I'm used to linux-style indentation and I'm not sure if I did it the right 
way. If it is wrong, please complain and I will try to reformat it.

-- 
Bernd Schubert
DataDirect Networks
---
 src/sparse.c |   18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Index: tar-1.19.sun1/src/sparse.c
===================================================================
--- tar-1.19.sun1.orig/src/sparse.c
+++ tar-1.19.sun1/src/sparse.c
@@ -216,7 +216,7 @@ sparse_scan_file (struct tar_sparse_file
   struct tar_stat_info *st = file->stat_info;
   int fd = file->fd;
   char buffer[BLOCKSIZE];
-  size_t count;
+  size_t count = 0;
   off_t offset = 0;
   struct sp_array sp = {0, 0};
 
@@ -224,7 +224,20 @@ sparse_scan_file (struct tar_sparse_file
     return false;
 
   st->archive_file_size = 0;
-  
+
+#ifdef HAVE_ST_BLOCKS
+  /* if this file has no blocks, it's all sparse */
+  if (ST_NBLOCKS(st->stat) == 0)
+    {
+      /* TODO: Do we need lseek here? Or can we use st->stat.st_size?
+       *       What does "tar_sparse_scan (file, scan_end, NULL)" do? */
+      offset = lseek(fd, 0, SEEK_END);
+      if (offset == -1)
+	return false; /* Weird error */
+      goto finalize;
+    }
+#endif
+
   if (!tar_sparse_scan (file, scan_begin, NULL))
     return false;
 
@@ -255,6 +268,7 @@ sparse_scan_file (struct tar_sparse_file
       offset += count;
     }
 
+finalize:
   if (sp.numbytes == 0)
     sp.offset = offset;
 

Reply via email to