I released a new tool for weighted random sampling of tabular data files: tsv-sample. It's one of several tools recently added to tsv file toolkit I released last year. These tools are especially useful when data files are larger than is desirable to read entirely into memory in R and similar apps.

I'll publish an announcement of broader set of tools updates in the next few weeks. I have some performance benchmarks to finish first. However, weighted reservoir sampling algorithms are interesting, I thought there might be enough interest to warrant a separate announcement.


Repo: https://github.com/eBay/tsv-utils-dlang
tsv-sample code: https://github.com/eBay/tsv-utils-dlang/blob/master/tsv-sample/src/tsv-sample.d

--Jon

Reply via email to