I tried to understand the grass wiki on Large File Support, sorry for being a bit late with that!

Glynn Clements wrote:
Markus Metz wrote:

If the coor file size stored in the topo file is indeed needed to properly process the coor file, the respective variables must be something else than long in order to support coor files larger than 2 GB, maybe long long? Same for all intermediate variables in the vector library storing coor file size. Looking at limits.h, long can be like int or like long long (only true 64 bit systems). I use Linux 64bit with 32bit compatibility, here long is like int. Someone more familiar with type limits and type declarations on different systems please help!

As you note, long will normally be the largest size which the CPU can
handle natively, while long long (only available in C99 or as a gcc
extension) can be expected to be 64 bits where it exists. FWIW, "int"
can theoretically be 64 bits, but this is rare.

The correct type to use for the size of a file is off_t, which can be
made to be a 64-bit type by adding -D_FILE_OFFSET_BITS=64 to the
compilation switches. This should only be done if $(USE_LARGEFILES) is
non-empty (corresponding to --enable-largefile).

However, that alone isn't sufficient, as you have to explicitly force
offset calculations to be performed using off_t rather than int/long,
e.g.:

        long idx, step;
        ...
        off_t offset = (off_t) idx * step;
or:
        off_t offset = idx * (off_t) step;

Note that:

        off_t offset = idx * step;
and:
        off_t offset = (off_t) (idx * step);

won't work, as the result isn't up-cast until after it has been
truncated.
I think I understand. So according to the grass wiki the steps to enable large file support would be

1) add
ifneq ($(USE_LARGEFILES),)
EXTRA_CFLAGS = -D_FILE_OFFSET_BITS=64
endif

to all relevant Makefiles

2) use off_t where appropriate, and take care with type casting. file offset is used in various different places in the vector library, a bit of work to get off_t usage right.

3) solve the fseek/fseeko and ftell/ftello problem. Get inspiration from libgis and LFS-safe modules? Or as suggested in the grass wiki on LFS, add
extern off_t G_ftell(FILE *fp);
extern int G_fseek(FILE *stream, off_t offset, int whence);
for global use?

4) figure out if coor file size really needs to be stored in coor and topo. coor file size doesn't say a lot about the number of features because coor can contain a high proportion of dead lines (a problem in itself, vector TODO). If if does not need to be stored in coor and topo, how does removing coor file size info affect reading and writing of coor and topo? Are there hard-coded offsets for reading these files?

It would be great to have LFS support in vector libs in grass7! I am getting coor files > 2GB more and more often with v.in.ogr and v.clean, and I suspect that modifying a coor file > 2GB, even if the module does the work and does not complain, produces unusable results. I can now modify a module so that it takes say a 1GB coor file, works on it, e.g. do some cleaning, coor file grows over 2GB in the process, at the end of the module only alive lines are written out and the resulting coor file is again below 2 GB. But in between some unnoticed errors may have occurred. This is no good.

Regards,

Markus

_______________________________________________
grass-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/grass-dev

Reply via email to