On Thu, Aug 25, 2022 at 3:47 PM Qianqian Fang <fan...@gmail.com> wrote:
> On 8/25/22 12:25, Robert Kern wrote: > > I don't quite know what this means. My installed version of `jq`, for > example, doesn't seem to know what to do with these files. > > ❯ jq --version > jq-1.6 > > ❯ jq . eye5chunk_bjd_raw.jdb > parse error: Invalid numeric literal at line 1, column 38 > > > the .jdb files are binary JSON files (specifically BJData) that jq does > not currently support; to save as text-based JSON, you change the suffix to > .json or .jdt - it results in ~33% increase compared to the binary due to > base64 > > Okay. Given your wording, it looked like you were claiming that the binary JSON was supported by the whole ecosystem. Rather, it seems like you can either get binary encoding OR the ecosystem support, but not both at the same time. > I think a fundamental problem here is that it looks like each element in > the array is delimited. I.e. a `float64` value starts with b'D' then the 8 > IEEE-754 bytes representing the number. When we're talking about > memory-mappability, we are talking about having the on-disk representation > being exactly what it looks like in-memory, all of the IEEE-754 floats > contiguous with each other, so we can use the `np.memmap` `ndarray` > subclass to represent the on-disk data as a first-class array object. This > spec lets us mmap the binary JSON file and manipulate its contents in-place > efficiently, but that's not what is being asked for here. > > there are several BJData-compliant forms to store the same binary array > losslessly. The most memory efficient and disk-mmapable (but not > necessarily disk-efficient) form is to use the ND-array container syntax > <https://github.com/NeuroJSON/bjdata/blob/Draft_2/Binary_JData_Specification.md#optimized-n-dimensional-array-of-uniform-type> > that BJData spec extended over UBJSON. > > Are any of them supported by a Python BJData implementation? I didn't see any option to get that done in the `bjdata` package you recommended, for example. https://github.com/NeuroJSON/pybj/blob/a46355a0b0df0bec1817b04368a5a573358645ef/bjdata/encoder.py#L200 -- Robert Kern
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com