Hi,

as David Ryskalczyk stated, just two printf format specifiers might cause the 
havoc. I think, there is not need to use wgint instead of off_t.

@Guiseppe: please apply the appended patches (maybe adding them together into 
one commit)

@Gijs: Could you check, if these patches fix the issue ?

Regards, Tim

Am Monday 12 November 2012 schrieb Gijs van Tulder:
> Hi,
> 
> There's a somewhat serious issue in the WARC-generating code: on some
> platforms (presumably the ones where off_t is not a 64-bit number) the
> Content-Length header at the top of each WARC record has an incorrect
> length. On these platforms it is sometimes 0, sometimes 1, but never the
> correct length. This makes the whole WARC file unreadable.
> 
> The code works fine on many platforms, but it is apparently a problem on
> some PowerPC and ARM systems, and maybe other systems as well.
> 
> Existing WARC files with this problem can be repaired by replacing the
> value of the Content-Length header with the correct value, for each WARC
> record in the file. The content of the WARC records is there, it's just
> the Content-Length header that is wrong.
> 
> The attached patch fixes the problem in warc.c. It replaces off_t by
> wgint and uses the number_to_static_string function from util.c.
> 
> Regards,
> 
> Gijs

Mit freundlichem Gruß

     Tim Rühsen
From d778dae0d2abaf036f415ee0e54c91241cbadee0 Mon Sep 17 00:00:00 2001
From: Tim Ruehsen <[email protected]>
Date: Wed, 14 Nov 2012 11:11:34 +0100
Subject: [PATCH 1/2] fix output of off_t variables

---
 src/warc.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/warc.c b/src/warc.c
index de99bf7..ef82fdf 100644
--- a/src/warc.c
+++ b/src/warc.c
@@ -247,7 +247,7 @@ warc_write_block_from_file (FILE *data_in)
   /* Add the Content-Length header. */
   char *content_length;
   fseeko (data_in, 0L, SEEK_END);
-  if (! asprintf (&content_length, "%ld", ftello (data_in)))
+  if (! asprintf (&content_length, "%lld", (long long)ftello (data_in)))
     {
       warc_write_ok = false;
       return false;
@@ -1231,9 +1231,9 @@ warc_write_cdx_record (const char *url, const char *timestamp_str,
     redirect_location = "-";
 
   /* Print the CDX line. */
-  fprintf (warc_current_cdx_file, "%s %s %s %s %d %s %s - %ld %s %s\n", url,
+  fprintf (warc_current_cdx_file, "%s %s %s %s %d %s %s - %lld %s %s\n", url,
            timestamp_str_cdx, url, mime_type, response_code, checksum,
-           redirect_location, offset, warc_current_filename, response_uuid);
+           redirect_location, (long long)offset, warc_current_filename, response_uuid);
   fflush (warc_current_cdx_file);
 
   return true;
-- 
1.7.10.4

From 45bc8655218d54e5dbac9103f6d7267b9d9126c2 Mon Sep 17 00:00:00 2001
From: Tim Ruehsen <[email protected]>
Date: Wed, 14 Nov 2012 11:14:01 +0100
Subject: [PATCH 2/2] fix checking asprintf return value

---
 src/warc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/warc.c b/src/warc.c
index ef82fdf..85f7ba3 100644
--- a/src/warc.c
+++ b/src/warc.c
@@ -247,7 +247,7 @@ warc_write_block_from_file (FILE *data_in)
   /* Add the Content-Length header. */
   char *content_length;
   fseeko (data_in, 0L, SEEK_END);
-  if (! asprintf (&content_length, "%lld", (long long)ftello (data_in)))
+  if (asprintf (&content_length, "%lld", (long long)ftello (data_in)) != -1)
     {
       warc_write_ok = false;
       return false;
-- 
1.7.10.4

From f75e6d10968fe6004c473a1b8d2f2bddd23d266c Mon Sep 17 00:00:00 2001
From: Tim Ruehsen <[email protected]>
Date: Wed, 14 Nov 2012 11:22:16 +0100
Subject: [PATCH] added ChangeLog entries

---
 src/ChangeLog |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/ChangeLog b/src/ChangeLog
index 2ccfe6f..b7aee5d 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,9 @@
+2012-10-07  Tim Ruehsen  <[email protected]>
+
+	* warc.c (warc_write_block_from_file): fix off_t format string
+	* warc.c (warc_write_block_from_file): fix checking asprintf return value
+	* warc.c (warc_write_cdx_record): fix off_t format string
+
 2012-10-07  Ray Satiro <[email protected]>
 
 	* url.c: Change the functions of a growable string object to null
-- 
1.7.10.4

Reply via email to