Hi Harri, Rainer, thanks for sharing your thoughts!
Am 05.02.15 um 15:19 schrieb Harri Pasanen: > On 05/02/2015 14:44, Till Oliver Knoll wrote: >> >> Am 05.02.2015 um 14:25 schrieb Till Oliver Knoll >> <[email protected] <mailto:[email protected]>>: >> >>>... I am really >> just interested in whether concurrent read/write access should be >> avoided in the first place these days (or not). >> ... > The usual answer is "it depends".. > > It depends on how much data you are accessing at each write/read. It > also depends on the underlying filesystem and size of files / how many > files you are dealing with. In my concrete use case I have "ordinary single harddisk desktop systems" in mind, that is, no embedded ("limited") hardware, but also no dedicated file server with RAID, highly optimised "Super-Filesystem" and so on - just "plain vanilla" desktops. Also, in my concrete use case I have "batch resize of photos" in mind, where each file is around 5 (JPEG) - 25 ("Raw") MByte in size. I don't know how fast the actual resizing will be - I have a combined CPU/GPU solution in mind, for the sake of getting a bit into OpenCL - but I imagine it won't empty the "Work Queue" faster than I can fill it by reading the original images from disk and enqueuing it into the Work Queue. Also, I plan to have a size limit of the Work Queue, so I imagine I won't be reading "full steam" all the time (but who knows - maybe I end up being able to scale an image faster than I can read and decode the JPEG data ;)) So whenever I am not reading I could use that time to empty the "Result Queue" and write the data to disk. And off course the assumption is that we read and write from/to the same harddisk ;) I guess there are still a lot of "depends" in that use case above. I was hoping to get a "general advice/rule of thumb" whether it is a good idea to have two distinct threads, reading/writing "concurrently" from/to the harddisk, where the data is big ((several MByte), but not as big as in "streaming a movie" (in the order of GB). > It also depends on your disk array, if you have one or more disks and > capacity of the disks, which affects then number of read/write heads the > disk has. Also The NCQ* implementation and cache RAM amount in a disk > makes a difference. I was actually hoping that nowadays modern (say, <= 3 years old) harddisks and Operating Systems (Windows, Linux, Mac) would handle the above case somehow for me, given that the size of each file is up to 25 MBytes, and I could just "go ahead" and read/write. Maybe there is even a technique which optimises concurrent read/write operations (off course an OS/harddisk controller could only go that far to optimise concurrent access - I guess when I try to read e.g. 10 times the same file, or even different files, at different locations then it's "game over"). > If you are on linux, you already get a lot of optimization out of the > box, it is typically much better than any other OS. But even within > linux the filesystem used makes a difference, for example some > filesystems are good with lots of small files. Sometimes file deletion > is the bottleneck. > > In the end in spinning drives the underlying physics of spinning media > and moving read/write heads affect things. In the end I think it is really the required physical moving of that head, rather than the file system (the file system might have an influence on how the data is "distributed" on the physical drive, but I guess that is negligible with regards to concurrent read/write operations, no?). > But if you want maximum IO performance, the rule of thumb is to group > your reads and writes, and read/write as much data as possible at once. > Even SSDs typically favor this. In highly parallel supercomputer > settings different rules may apply. That's what my gut feeling tells me as well. Also Stack Overflow answers to question like these seem to confirm this: http://stackoverflow.com/questions/5321768/how-many-threads-for-reading-and-writing-to-the-hard-disk On the other hand Rainer wrote: Am 05.02.15 um 15:24 schrieb Rainer Wiesenfarth:> From: Till Oliver Knoll >> Am 05.02.2015 um 14:25 schrieb Till Oliver Knoll: >>> ... >> http://www.tomshardware.co.uk/forum/251768-32-impact-concurrent-speed >> [...] > > Please note that this post is more than five years old. Things - namely I/O schedulers in operating systems and hard disk caching - have changed since then. I was hoping so, too. > I would _assume_ that any modern OS is capable of scheduling I/O for maximum performance. In addition, an own I/O scheduler would probably only work for bare metal access to the harddisk. Otherwise, the underlying file system and its potential fragmentation might void all your effort. > > Thus my approach would be to start any number of concurrent reads and writes that makes sense for the application side and start optimizing if (and only if!) throughput is too bad. Other links that I found seem to support that, that the underlying "scheduler" figures out the best read/write strategy, and any attempt by the application to implement that by itself would be counter-productive (assuming "finite read/write" operaions, that is, not endlessly reading several GB of data "non-stop"): http://superuser.com/questions/365875/can-hard-disks-read-and-write-simultaneously-on-different-tracks-how But maybe that answer only applied to the asked question "copy/pasting a file". So I guess what this all boils down to is: "I have to try for myself" :) I let you know how it goes (the biggest problem however is that the only computer in my household still having a spinning harddisk is a 15 year old laptop running Windows 2000 ;)) Thanks a lot, Oliver _______________________________________________ Interest mailing list [email protected] http://lists.qt-project.org/mailman/listinfo/interest
