In these enlightened times when 2G+ or "large" files are no longer
considered large even in the third world, more and more people ask for
the ability to download huge files with Wget.

Wget carefully uses `long' for potentially "large" values, such as
file sizes and offsets, but that has no effect on the most popular
32-bit architectures, where `long' and `int' are both 32-bit
quantities.  (It does help on 16-bit architectures where `int' is
16-bit, and it helps under 64-bit "LP64" environments where int is
32-bit, but `long' and `long long' are 64-bit.)

There have been several attempts to fix this:

* The hack called VERY_LONG_TYPE is used to store values that can be
  reasonably larger than 2G, such as the sum of all downloads.
  However, on machines without `long long', VERY_LONG_TYPE will be
  long.  Since it is not used for anything critical, that's not much
  of a problem (and Wget is careful to detect overflows when adding to
  the sum, so bogus values are not printed.)

* SuSE incorporated patches that change Wget's use of `long' to
  `unsigned long', which upgraded the limit from 2G to 4G.  Aside from
  all the awkwardness that comes from unsigned arithmetic (checking
  for error conditions with x<0 doesn't work; you have to use x==-1),
  its effect is limited: if I want to download a 3G file today, I'll
  want to download a 5G file tomorrow.

* In its own patches, Debian introduced the use of large file APIs and
  `long long'.  While that's perfectly fine for Debian, it is not
  portable.  Neither the large file API nor `long long' are
  universally available, and both need thorough configure checking.

I believe that large numbers and large files are orthogonal.  We need
a large numeric type to represent numbers that *could* be large, be it
the sum of downloaded bytes, remote file sizes, or local file sizes or
offsets.  Independently, we need to use large file API where
available, to be able to write and read large files locally.

Of those two issues, choosing and using the numeric type is the hard
one.  Autoconf helps only to an extent -- even if you define your own
`large_number_t' typedef, which is either `long' or `long long', the
question remains how to print that number.  Even worse, some systems
have `long long' (because they use gcc), but don't support it in libc,
so printf can't print it.

One way to solve this is to define macros for printing types.  For
example:

#ifdef HAVE_LONG_LONG
  typedef long long large_number_t;
# define LN_PRINT "lld"
#else
  typedef double large_number_t;
# define LN_PRINT "f"
#endif

Then this becomes legal code:

    large_number_t num = 0;
    printf ("The number is: %" LN_PRINT "!\n", num);

Aside from being butt-ugly, this code has two serious problems.

1. Concatenation of adjacent string literals is an ANSI feature and
   would break pre-ANSI compilers.

2. It breaks gettext.  With translation support, the above code would
   look like this:

     large_number_t num = 0;
     printf (_("The number is: %" LN_PRINT "!\n"), num);

   The message snarfer won't be able to process this because it
   expects a string literal inside _(...).  Even if it were taught
   about string concatenation, it wouldn't know what to replace
   LN_PRINT with, unless it ran the preprocessor.  And if it ran the
   preprocessor, it would get non-portable results ("ld" or "f") which
   cannot be stored to the message catalog.

The bottom line is, I really don't know how to solve this portably.
Does anyone know how widely ported software deals with large files?

Reply via email to