Hi,
as David Ryskalczyk stated, just two printf format specifiers might cause the
havoc. I think, there is not need to use wgint instead of off_t.
@Guiseppe: please apply the appended patches (maybe adding them together into
one commit)
@Gijs: Could you check, if these patches fix the issue ?
Regards, Tim
Am Monday 12 November 2012 schrieb Gijs van Tulder:
> Hi,
>
> There's a somewhat serious issue in the WARC-generating code: on some
> platforms (presumably the ones where off_t is not a 64-bit number) the
> Content-Length header at the top of each WARC record has an incorrect
> length. On these platforms it is sometimes 0, sometimes 1, but never the
> correct length. This makes the whole WARC file unreadable.
>
> The code works fine on many platforms, but it is apparently a problem on
> some PowerPC and ARM systems, and maybe other systems as well.
>
> Existing WARC files with this problem can be repaired by replacing the
> value of the Content-Length header with the correct value, for each WARC
> record in the file. The content of the WARC records is there, it's just
> the Content-Length header that is wrong.
>
> The attached patch fixes the problem in warc.c. It replaces off_t by
> wgint and uses the number_to_static_string function from util.c.
>
> Regards,
>
> Gijs
Mit freundlichem Gruß
Tim Rühsen
From d778dae0d2abaf036f415ee0e54c91241cbadee0 Mon Sep 17 00:00:00 2001
From: Tim Ruehsen <[email protected]>
Date: Wed, 14 Nov 2012 11:11:34 +0100
Subject: [PATCH 1/2] fix output of off_t variables
---
src/warc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/warc.c b/src/warc.c
index de99bf7..ef82fdf 100644
--- a/src/warc.c
+++ b/src/warc.c
@@ -247,7 +247,7 @@ warc_write_block_from_file (FILE *data_in)
/* Add the Content-Length header. */
char *content_length;
fseeko (data_in, 0L, SEEK_END);
- if (! asprintf (&content_length, "%ld", ftello (data_in)))
+ if (! asprintf (&content_length, "%lld", (long long)ftello (data_in)))
{
warc_write_ok = false;
return false;
@@ -1231,9 +1231,9 @@ warc_write_cdx_record (const char *url, const char *timestamp_str,
redirect_location = "-";
/* Print the CDX line. */
- fprintf (warc_current_cdx_file, "%s %s %s %s %d %s %s - %ld %s %s\n", url,
+ fprintf (warc_current_cdx_file, "%s %s %s %s %d %s %s - %lld %s %s\n", url,
timestamp_str_cdx, url, mime_type, response_code, checksum,
- redirect_location, offset, warc_current_filename, response_uuid);
+ redirect_location, (long long)offset, warc_current_filename, response_uuid);
fflush (warc_current_cdx_file);
return true;
--
1.7.10.4
From 45bc8655218d54e5dbac9103f6d7267b9d9126c2 Mon Sep 17 00:00:00 2001
From: Tim Ruehsen <[email protected]>
Date: Wed, 14 Nov 2012 11:14:01 +0100
Subject: [PATCH 2/2] fix checking asprintf return value
---
src/warc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/warc.c b/src/warc.c
index ef82fdf..85f7ba3 100644
--- a/src/warc.c
+++ b/src/warc.c
@@ -247,7 +247,7 @@ warc_write_block_from_file (FILE *data_in)
/* Add the Content-Length header. */
char *content_length;
fseeko (data_in, 0L, SEEK_END);
- if (! asprintf (&content_length, "%lld", (long long)ftello (data_in)))
+ if (asprintf (&content_length, "%lld", (long long)ftello (data_in)) != -1)
{
warc_write_ok = false;
return false;
--
1.7.10.4
From f75e6d10968fe6004c473a1b8d2f2bddd23d266c Mon Sep 17 00:00:00 2001
From: Tim Ruehsen <[email protected]>
Date: Wed, 14 Nov 2012 11:22:16 +0100
Subject: [PATCH] added ChangeLog entries
---
src/ChangeLog | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/src/ChangeLog b/src/ChangeLog
index 2ccfe6f..b7aee5d 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,9 @@
+2012-10-07 Tim Ruehsen <[email protected]>
+
+ * warc.c (warc_write_block_from_file): fix off_t format string
+ * warc.c (warc_write_block_from_file): fix checking asprintf return value
+ * warc.c (warc_write_cdx_record): fix off_t format string
+
2012-10-07 Ray Satiro <[email protected]>
* url.c: Change the functions of a growable string object to null
--
1.7.10.4