On Friday, 8 February 2013 at 06:22:18 UTC, Denis Shelomovskij wrote:
06.02.2013 19:40, bioinfornatics пишет:
On Wednesday, 6 February 2013 at 13:20:58 UTC, bioinfornatics wrote: I agree the spec format is really bad but it is heavily used in biology so i would like a fast parser to develop some D application instead to
use C++.

Yes, lets also create 1 GiB XML files and ask for fast encoding/decoding!

The situation can be improved only if:
1. We will find and kill every text format creator;
2. We will create a really good binary format for each such task and support it in every application we create. So after some time text formats will just die because of evolution as everything will support better formats.

(the second proposal is a real recommendation)

There is a binary resource format for emf models, which normally use xml files, and some timing improvements stated at this link. It might be worth looking at this if you are thinking about writing your own binary format.
http://www.slideshare.net/kenn.hussey/performance-and-extensibility-with-emf

There is also a fast binary compression library named blosc that is used in some python utilities, measured and presented here, showing that it is faster than doing a memcpy if you have multiple cores.
http://blosc.pytables.org/trac

On the sequential accesses ... I found that windows writes blocks of data all over the place, but the best way to get it to write something in more contiguous locations is to modify the file output routines to use specify write through. The sequential accesses didn't improve read times on ssd.

Most of the decent ssds can read big files at 300MB/sec or more now, and you can raid 0 a few of them and read 800MB/sec.

Reply via email to