Re: Updates to the tsv-utils toolkit

Jon Degenhardt via Digitalmars-d-announce Sat, 04 Mar 2017 11:51:47 -0800

On Wednesday, 22 February 2017 at 18:12:50 UTC, Jon Degenhardtwrote:

It's not quite a year since the open-sourcing of eBay's tsvutilities. Since then there have been a number of additions andupdates, and the tools form a more complete package. The toolsassist with manipulation of tabular data files common inmachine learning and data mining environments. They workalongside traditional Unix command line tools like 'cut', and'sort'. They also fit well with data mining and stats packageslike R and Pandas.
The tools include filtering, slicing, joins and othermanipulation, sampling, and statistical calculations. If youfind yourself working with large data files from a unix shell,you may like these tools.
Speed matters when processing large data files, and these toolsare fast. I've published new benchmarks comparing the tools tosimilar tools written in several native compiled programminglanguages. The tools are the fastest on five of the sixbenchmarks run, generally by significant margins. It's a goodresult for the D programming language. The benchmarks may be ofinterest regardless of your interest in the tools themselves.
Repository: https://github.com/eBay/tsv-utils-dlang
Performance benchmarks:https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md
--Jon

One more update: Schveiguy helped identify the performancebottleneck in the csv2tsv tool, now the tools are the fastest onall six benchmarks. Benchmarks have been updated (and reformatteda bit). Summary table here:https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md#top-four-in-each-benchmark

Re: Updates to the tsv-utils toolkit

Reply via email to