Hi all, This is to announce [`npreadtext`](https://github.com/BIDS-numpy/npreadtext), a drop-in replacement for `numpy.loadtxt` written in C for improved performance. We are now at feature parity with `loadtxt`, and would greatly appreciate your feedback & testing. We hope eventually to include `npreadtext` in NumPy itself.
## Installation `npreadtext` has been tested with NumPy v1.18 and higher and can be installed using: ``` python -m pip install numpy python -m pip install git+git://github.com/BIDS-numpy/npreadtext ``` To enable the C-accelerated version of `np.loadtxt`, monkey-patch NumPy: ```python >>> import numpy as np >>> from npreadtxt import monkeypatch_numpy ``` This replaces `np.loadtxt` with `npreadtext._loadtxt`. ## Feedback You may leave comments here or file issues on the [project issue tracker]( https://github.com/BIDS-numpy/npreadtext/issues). Please also share text files that strain or break the reader. ## Benchmarks Preliminary benchmarks show a significant improvement in performance: ``` python runtests.py --bench-compare monkeypatch-npreadtext bench_io npreadtext np.loadtxt speedup function + 7.74±0.04ms 146±0.8ms 18.85 bench_io.LoadtxtCSVStructured.time_loadtxt_csv_struct_dtype + 9.67±0.1ms 181±0.6ms 18.67 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 100000) + 969±10μs 17.9±0.1ms 18.48 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 10000) + 950±7μs 14.6±0.04ms 15.39 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 10000) + 9.65±0.03ms 146±0.2ms 15.13 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 100000) + 11.8±0.06ms 141±0.3ms 11.96 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 100000) + 11.9±0.1ms 141±0.3ms 11.88 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 100000) + 12.6±0.1ms 150±0.6ms 11.85 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100000) + 1.18±0.01ms 13.9±0.1ms 11.74 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 10000) + 1.19±0.01ms 13.9±0.09ms 11.68 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 10000) + 1.27±0ms 14.7±0.06ms 11.64 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10000) + 12.4±0.06ms 140±0.6ms 11.28 bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(100000) + 1.22±0.02ms 13.8±0.09ms 11.26 bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(10000) + 20.8±0.2μs 194±0.5μs 9.32 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 100) + 20.4±0.2μs 162±0.3μs 7.97 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 100) + 1.04±0ms 8.17±0.08ms 7.84 bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3, 5, 7]) + 884±2μs 6.79±0.02ms 7.68 bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3]) + 1.56±0.01ms 12.0±0.05ms 7.68 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10000) + 16.1±0.05ms 122±0.3ms 7.56 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100000) + 23.4±0.04μs 163±0.9μs 6.94 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100) + 22.6±0.09μs 153±0.2μs 6.76 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 100) + 22.9±0.5μs 154±0.7μs 6.72 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 100) + 22.8±0.5μs 150±0.8μs 6.58 bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(100) + 809±8μs 5.10±0.02ms 6.30 bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv(2) + 7.31±0.01ms 42.0±0.08ms 5.75 bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(20000) + 748±2μs 4.11±0.04ms 5.50 bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(2000) + 26.0±0.2μs 131±0.3μs 5.02 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100) + 87.3±0.4μs 436±1μs 5.00 bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(200) + 2.09±0.01ms 10.1±0.04ms 4.86 bench_io.LoadtxtReadUint64Integers.time_read_uint64(10000) + 2.09±0ms 10.1±0.04ms 4.83 bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(10000) + 215±0.5μs 1.03±0ms 4.82 bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(1000) + 217±0.9μs 1.02±0ms 4.72 bench_io.LoadtxtReadUint64Integers.time_read_uint64(1000) + 123±0.6μs 580±3μs 4.71 bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(550) + 124±0.8μs 573±4μs 4.63 bench_io.LoadtxtReadUint64Integers.time_read_uint64(550) + 4.15±0.01ms 14.4±0.05ms 3.46 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10000) + 58.6±0.1ms 195±0.8ms 3.33 bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(10000) + 41.8±0.1ms 139±1ms 3.33 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100000) + 64.6±0.09ms 215±1ms 3.32 bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(500) + 64.9±0.2ms 215±2ms 3.30 bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(0) + 55.0±0.5μs 154±0.4μs 2.81 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100) + 23.9±0.1μs 60.1±1μs 2.51 bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(20) + 12.1±0.2μs 29.4±0.2μs 2.44 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 10) + 12.0±0.05μs 26.2±0.2μs 2.18 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 10) + 12.5±0.08μs 26.1±0.09μs 2.08 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10) + 12.3±0.04μs 24.9±0.4μs 2.02 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 10) + 12.3±0.1μs 24.8±0.2μs 2.02 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 10) + 12.2±0.04μs 24.5±0.1μs 2.01 bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(10) + 13.3±0.1μs 23.4±0.1μs 1.76 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10) + 18.5±0.3μs 25.6±0.5μs 1.39 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10) ``` The repository includes [procedures for running benchmarks locally]( https://github.com/BIDS-numpy/npreadtext#benchmarking).
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion