Hi,

I have run into an issue where our code tries to read very large files 
(>2^31 bytes in size) and is effectively running into what looks like an 
integer overflow issue. What happens is that the int64_t members of stat_t 
("size") and also the return value of llseek are implicitly down-cast into 
signed ints. Here is what we do to mount our file system (slightly 
simplified for brevity):

         var node = Module.FS.createFile('/', emscriptenPath, null, true, 
true);

    node.node_ops = {
        getattr: function(ganode) {
            return {
                dev: 1,
                ino: ganode.id,
                mode: ganode.mode,
                nlink: 1,
                uid: 0,
                gid: 0,
                rdev: ganode.rdev,
                size: size,  // <-- this is a file size > 2^31
                atime: new Date(ganode.timestamp),
                mtime: new Date(ganode.timestamp),
                ctime: new Date(ganode.timestamp),
                blksize: 4096,
                blocks: Math.ceil(size / 4096)
            };
        }        
    };

    node.stream_ops = {
        llseek: function(stream, offset, whence) {
            switch (whence) {
                case 0: // SEEK_SET
                stream.position = offset;
                break;
                case 1: // SEEK_CUR
                stream.positon += offset;
                break;
                case 2: // SEEK_END
                stream.position = size + offset;
                break;
                default:
                throw new Module.FS.ErrnoError(22); // EINVAL
            }

            return stream.position; // <-- can be > 2^31
        }, 
        read: function(stream, buffer, heapOffset, numberOfBytes, 
fileOffset) {
            // ...
        } 
    };

I suspect that the issue arises from the fact that int64_t has no native 
counterpart in JS and is, hence, downcast in the interface between the 
asm.js and the file system code. Is there a quick fix to address this 
issue? I tried -s PRECISE_I64_MATH=2, but to no avail. Also, I am not 
entirely sure where exactly the precision is lost. I guess, it happens in 
the __syscallXY functions for fstat, lseek (and probably also for the 
arguments passed into read). 

One idea I had was to patch the syscalls in a way that I render the int64_t 
values as strings on the heap and pass back the pointer to that string 
inside the stat_t structure and the return value of llseek. These strings 
would then have to be parsed back into int64_t values inside the syscalls. 
Not exactly elegant, but it might work. Or is there a generic solution?

Thanks heaps in advance for any suggestions...

Soeren

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to