On Fri, Aug 03, 2007 at 15:19:33 -0400, Nathan Stratton Treadway wrote: > However, it seems like even though the timestamp on "proc" isn't > correct, tar shouldn't be creating a snapshot file which it then > considers to be corrupt when reading it. Could it be updated (either > the routines creating the file or those reading it) to handle this > situation more gracefully?
[Copies of my original postings on this topic can be found at (for example) http://www.archivum.info/bug-tar@gnu.org/2007-08/msg00003.html ] I realized that this issue would be difficult to debug on a system where the "nsec" values were all reasonable, so I did a little more digging on my own system (using tar v1.22). As I mentioned originally, the underlying problem seems to be a bug in the Linux kernel, which returns an out-of range value for the /proc (and /sys) filesystem's mtime.nsec field. The "stat" command on my system treats the value as a negative number ("-1857292880" in my earlier email), while the call to the "umaxtostr" on line 1315 of incremen.c (within the write_directory_file_entry function) converts the argument to type uintmax_t and the string representation comes out as a very large number ("18446744071852258736" in my earlier email), which then gets written to the snapshot file. It's this value that caused a fatal error when tar tries to read the snapshot file back in on the next run. I will investigate the kernel issue separately, but in any case seems like tar could be a little more robust in this situation. On the one hand, the message that is currently printed ("Unexpected field value in snapshot file") does not give any indication where in the snapshot file the problem was found. Because of this, it's difficult to know what to investigate next when the error occurs. (This is true even with the --verbose option in effect, and I didn't see any other way to cause tar to give me hints as to what was going wrong.) Looking at the structure of the routines that parse that file, I now see that the directory name doesn't get read from the snapshot file until a few fields after the timespec, and thus that name isn't available to print out in the fatal error message printed within in the "read_timespec" function. So, though it certainly would still be nice for the parse-time fatal error messages to include that information, it seems like a more direct solution to this particular problem is to do more validation as the snapshot file is written. (This has the side benefit of warning the user that something went wrong right away as the snapshot file is created, instead of only later when an attempt is made to read the file back in.) I've attached a patch that shows what I mean. I'm not much of a C programmer, so this is more a proof-of-concept than anything else, but basically my idea was to implement the same checks during the writing process that trigger a fatal error during the parsing process. (My patch only does this for the "nsec" value; for the proof-of-concept, I didn't try to figure out if similar tests could be implemented on other fields.) With this patch in place, I now get a warning message during the first tar run letting me know that something is amiss with the "proc" directory. Let me know if I can provide any other info, or if you'd like me do any additional testing on the machine that has this problem. Thanks. Nathan p.s. My patch also includes (separate) changes to the error message text in read_negative_num and read_unsigned_num so that the contents of the invalid field is displayed. I don't know if there are reasons that info shouldn't be printed, but if it were included that would at least give the user a small hint as to what to look for when this kind of error occurs during parsing. ---------------------------------------------------------------------------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
--- tar-1.22/src/incremen.c 2008-12-29 18:18:00.000000000 -0500 +++ tar-1.22patched/src/incremen.c 2009-03-23 16:41:44.262611461 -0400 @@ -1073,7 +1073,9 @@ errno = 0; *pval = strtoimax (buf, &ep, 10); if (c || errno || *pval < min_val) - FATAL_ERROR ((0, errno, _("Unexpected field value in snapshot file"))); + FATAL_ERROR ((0, errno, + _("Unexpected field value in snapshot file: \"%s\""), + buf)); } /* Read from file FP a nul-terminated string and convert it to @@ -1113,7 +1115,9 @@ errno = 0; *pval = strtoumax (buf, &ep, 10); if (c || errno || max_val < *pval) - FATAL_ERROR ((0, errno, _("Unexpected field value in snapshot file"))); + FATAL_ERROR ((0, errno, + _("Unexpected field value in snapshot file: \"%s\""), + buf)); return c; } @@ -1293,6 +1297,30 @@ free (buf); } +/* Write a timespec out to the snapshot file, issuing a warning if + the values are out of range (i.e. if they would cause an error + when the file is being parsed by read_timespec() ). +*/ +void +write_timespec (FILE *fp, struct timespec ts, const char *dirname ) +{ + char buf[UINTMAX_STRSIZE_BOUND]; + char *s; + + s = (TYPE_SIGNED (time_t) + ? imaxtostr (ts.tv_sec, buf) + : umaxtostr (ts.tv_sec, buf)); + fwrite (s, strlen (s) + 1, 1, fp); + s = umaxtostr (ts.tv_nsec, buf); + /* depending on the data type of tv_nsec, an invalid value could + appear to be negative or too large; either case is an error. */ + if (ts.tv_nsec < 0 || ts.tv_nsec > BILLION -1) + WARN ((0, 0, + _("Invalid timestamp nsec value \"%s\" found for entry \"%s\""), + s, dirname)); + fwrite (s, strlen (s) + 1, 1, fp); +} + /* Output incremental data for the directory ENTRY to the file DATA. Return nonzero if successful, preserving errno on write failure. */ static bool @@ -1308,12 +1336,7 @@ s = DIR_IS_NFS (directory) ? "1" : "0"; fwrite (s, 2, 1, fp); - s = (TYPE_SIGNED (time_t) - ? imaxtostr (directory->mtime.tv_sec, buf) - : umaxtostr (directory->mtime.tv_sec, buf)); - fwrite (s, strlen (s) + 1, 1, fp); - s = umaxtostr (directory->mtime.tv_nsec, buf); - fwrite (s, strlen (s) + 1, 1, fp); + write_timespec(fp, directory->mtime, directory->name); s = umaxtostr (directory->device_number, buf); fwrite (s, strlen (s) + 1, 1, fp); s = umaxtostr (directory->inode_number, buf); @@ -1341,8 +1364,6 @@ write_directory_file (void) { FILE *fp = listed_incremental_stream; - char buf[UINTMAX_STRSIZE_BOUND]; - char *s; if (! fp) return; @@ -1355,12 +1376,7 @@ fprintf (fp, "%s-%s-%d\n", PACKAGE_NAME, PACKAGE_VERSION, TAR_INCREMENTAL_VERSION); - s = (TYPE_SIGNED (time_t) - ? imaxtostr (start_time.tv_sec, buf) - : umaxtostr (start_time.tv_sec, buf)); - fwrite (s, strlen (s) + 1, 1, fp); - s = umaxtostr (start_time.tv_nsec, buf); - fwrite (s, strlen (s) + 1, 1, fp); + write_timespec(fp, start_time, "tar execution start_time"); if (! ferror (fp) && directory_table) hash_do_for_each (directory_table, write_directory_file_entry, fp);