The same issue with stat syscall truncating at 32 bits exists for native as well, and there exists a stat64 syscall specifically for this purpose, which replaces the overflowing 32-bit fields with 64-bit ones. See https://www.ibm.com/support/knowledgecenter/en/ssw_i5_54/apis/stat64.htm . It does look like those 64-bit counterparts of the syscalls are implemented, so perhaps you can use those instead?
2018-03-06 1:02 GMT+02:00 Soeren Balko <[email protected]>: > Nah, we read these files progressively. ;-) > > On Tue, Mar 6, 2018 at 6:45 AM, Alon Zakai <[email protected]> wrote: > >> It's probably something customizable in musl, since it can run in 32 and >> 64 bit systems. Probably for emscripten we defined it as 32-bit since >> memory is 32-bit anyhow. So if you want to change this, just defining it as >> 64-bit and fixing up the syscalls would be enough. >> >> Do you really use files larger than you can fit in memory all at once, >> though? :) >> >> On Sun, Mar 4, 2018 at 3:05 PM, Sören Balko <[email protected]> >> wrote: >> >>> Thanks, Alon - very helpful! Having unsigned 32bit ints would help, but >>> not necessarily a lot. We process video files that can occasionally be >>> huge, especially when dealing with poorly compressed video streams such as >>> motion JPEGs. The fact that off_t is declared as a 32 bit int (signed or >>> not) strikes me as odd. Is that a musl limitation? >>> >>> >>> On Monday, 5 March 2018 04:58:55 UTC+10, Alon Zakai wrote: >>>> >>>> About the 31 bit issue, there's a chance the issue is that the asm.js >>>> FFI boundary is treated as signed (an asm.js function returning a 32-bit >>>> integer will use | 0). In that case, what might be the bug is that when JS >>>> calls a function returning an unsigned value it should use >>> 0. Another >>>> possibility is that the loads/stores of that struct value (makeSetValue >>>> etc.) may need to be marked as unsigned. >>>> >>>> On Sun, Mar 4, 2018 at 10:55 AM, Alon Zakai <[email protected]> wrote: >>>> >>>>> C_STRUCTS is generated from the headers in gen_struct_info.py, >>>>> basically by compiling small C programs to see what the offsets are. I >>>>> believe it does not look at sizes, though (except for __size__ which is >>>>> computed for the entire struct). The numbers there are the offsets, not >>>>> the >>>>> sizes. So st_size is at offset 36. >>>>> >>>>> The stat.h says >>>>> >>>>> off_t st_size; >>>>> blksize_t st_blksize; >>>>> >>>>> I'm not sure how to easily find the definition of off_t, but looking >>>>> in the offsets, st_size is 36 and st_blksize which is after it is 40, so >>>>> the size must be 4. So it's not big enough if you need more then 32 bits, >>>>> off_t would need to be redefined. (Do you really need more than 32 bits, >>>>> though?) >>>>> >>>>> A separate question is if 32-bit values work - I think you said 31 >>>>> bits seems to be the limit. That could be due to treating the value as >>>>> signed somewhere ( | 0 will do that). If 32 unsigned bits are enough for >>>>> you, finding that bug might be practical. >>>>> >>>>> >>>>> On Sat, Mar 3, 2018 at 12:31 AM, Soeren Balko <[email protected]> >>>>> wrote: >>>>> >>>>>> In struct_info.compiled.json, the "stat" struct is declared like so: >>>>>> >>>>>> "stat":{ >>>>>> >>>>>> "st_rdev":28, >>>>>> "st_mtim":{ >>>>>> "tv_sec":56, >>>>>> "tv_nsec":60, >>>>>> "__size__":8 >>>>>> }, >>>>>> "st_blocks":44, >>>>>> "st_atim":{ >>>>>> "tv_sec":48, >>>>>> "tv_nsec":52, >>>>>> "__size__":8 >>>>>> }, >>>>>> "st_nlink":16, >>>>>> "__st_ino_truncated":8, >>>>>> "st_ctim": { >>>>>> >>>>>> "tv_sec":64, >>>>>> "tv_nsec":68, >>>>>> "__size__":8 >>>>>> }, >>>>>> "st_mode":12, >>>>>> "st_blksize":40, >>>>>> "__st_dev_padding":4, >>>>>> "st_dev":0, >>>>>> "st_size":36, >>>>>> "st_gid":24, >>>>>> "__st_rdev_padding":32, >>>>>> "st_uid":20, >>>>>> "st_ino":72, >>>>>> "__size__":76 >>>>>> } >>>>>> >>>>>> >>>>>> I assume the properties are the bit widths of the various fields (?). >>>>>> According to this, st_size is 36 bits, which is enough to cater even for >>>>>> very large files. >>>>>> >>>>>> Can you please confirm, Alon? >>>>>> >>>>>> >>>>>> On Saturday, March 3, 2018 at 6:18:09 PM UTC+10, Soeren Balko wrote: >>>>>> >>>>>>> Thanks, Alon! This does indeed seem to be the issue. In >>>>>>> library_syscall.js, the "st_size" member is considered am i32 (see >>>>>>> below). >>>>>>> I do not yet fully understand how C_STRUCTS is generated. I can see >>>>>>> that >>>>>>> compiler.js receives a JSON object STRUCT_INFO that contains the type >>>>>>> definitions. Is this generated from the musl headers? >>>>>>> >>>>>>> doStat: function(func, path, buf) { >>>>>>> try { >>>>>>> var stat = func(path); >>>>>>> } catch (e) { >>>>>>> if (e && e.node && PATH.normalize(path) !== PATH.normalize(FS. >>>>>>> getPath(e.node))) { >>>>>>> // an error occurred while trying to look up the path; we should >>>>>>> just report ENOTDIR >>>>>>> return -ERRNO_CODES.ENOTDIR; >>>>>>> } >>>>>>> throw e; >>>>>>> } >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_dev, 'stat.dev', 'i32') >>>>>>> }}}; >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_dev_padding, '0', 'i32') >>>>>>> }}}; >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_ino_truncated, 'stat.ino >>>>>>> ', 'i32') }}}; >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_mode, 'stat.mode', 'i32') >>>>>>> }}}; >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_nlink, 'stat.nlink', 'i32') >>>>>>> }}}; >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_uid, 'stat.uid', 'i32') >>>>>>> }}}; >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_gid, 'stat.gid', 'i32') >>>>>>> }}}; >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_rdev, 'stat.rdev', 'i32') >>>>>>> }}}; >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_rdev_padding, '0', 'i32') >>>>>>> }}}; >>>>>>> *{{{ makeSetValue('buf', C_STRUCTS.stat.st_size, 'stat.size', 'i32') >>>>>>> }}};* >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_blksize, '4096', 'i32') >>>>>>> }}}; >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_blocks, 'stat.blocks', ' >>>>>>> i32') }}}; >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_atim.tv_sec, >>>>>>> '(stat.atime.getTime() >>>>>>> / 1000)|0', 'i32') }}}; >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_atim.tv_nsec, '0', 'i32') >>>>>>> }}}; >>>>>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_mtim.tv_sec, >>>>>>> '(stat.mtime.getTime() >>>>>>> / 1000)|0', 'i32') }}}; >>>>>>> <td id="LC61" class="blob-code blob-code-inner js-file-line" >>>>>>> style="text-align: left; box-sizing: border-box; padding-right: 10px; >>>>>>> padding-lef >>>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "emscripten-discuss" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "emscripten-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "emscripten-discuss" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/to >> pic/emscripten-discuss/B7sQdO0u6vk/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > > Soeren Balko, PhD > Director and CTO > clipchamp.com > [email protected] > > Add videos to your emails with the FREE Clipchamp plugin for Gmail > <https://chrome.google.com/webstore/detail/video-recorder-for-gmail/fjnhblbojcifepajihbcmanhlcbcipbh> > ! > > > -- > You received this message because you are subscribed to the Google Groups > "emscripten-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "emscripten-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
