On 8/25/22 18:33, Neal Becker wrote:
the loading time (from an nvme drive, Ubuntu 18.04, python 3.6.9,
numpy 1.19.5) for each file is listed below:
|0.179s eye1e4.npy (mmap_mode=None)||
||0.001s eye1e4.npy (mmap_mode=r)||
||0.718s eye1e4_bjd_raw_ndsyntax.jdb||
||1.474s eye1e4_bjd_zlib.jdb||
||0.635s eye1e4_bjd_lzma.jdb|
clearly, mmapped loading is the fastest option without a
surprise; it is true that the raw bjdata file is about 5x slower
than npy loading, but given the main chunk of the data are stored
identically (as contiguous buffer), I suppose with some
optimization of the decoder, the gap between the two can be
substantially shortened. The longer loading time of zlib/lzma
(and similarly saving times) reflects a trade-off between smaller
file sizes and time for compression/decompression/disk-IO.
I think the load time for mmap may be deceptive, it isn't actually
loading anything, just mapping to memory. Maybe a better
benchmark is to actually process the data, e.g., find the mean
which would require reading the values.
yes, that is correct, I meant to metion it wasn't an apple-to-apple
comparison.
the loading times for fully-loading the data and printing the mean, by
running the below line
|t=time.time(); newy=jd.load('eye1e4_bjd_raw_ndsyntax.jdb');
print(np.mean(newy)); t1=time.time() - t; print(t1)|
are summarized below (I also added lz4 compressed BJData/.jdb file via
|jd.save(..., {'compression':'lz4'})|)
|0.236s eye1e4.npy (mmap_mode=None)||- size: 800000128 bytes
||0.120s eye1e4.npy (mmap_mode=r)||
||0.764s eye1e4_bjd_raw_ndsyntax.jdb||(with C extension _bjdata in
sys.path) - size: 800000014 bytes|
||0.599s eye1e4_bjd_raw_ndsyntax.jdb||(without C extension _bjdata)|
||1.533s eye1e4_bjd_zlib.jdb|||(without C extension _bjdata)||| -
size: 813721
||0.697s eye1e4_bjd_lzma.jdb|||(without C extension _bjdata) - size:
113067
|||||0.918s eye1e4_bjd_lz4.jdb|||(without C extension _bjdata) -
size: 3371487 bytes||
||||
the mmapped loading remains to be the fastest, but the run-time is more
realistic. I thought the lz4 compression would offer much faster
decompression, but in this special workload, it isn't the case.
It is also interesting to see that the bjdata's C extension
<https://github.com/NeuroJSON/pybj/tree/master/src> did not help when
parsing a single large array compared to the native python parser,
suggesting rooms for further optimization|.|||
||
||
||
||Qianqian||
||
||
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com