Thanks all for your non-scarce replies to my question and a lot of practical considerations. I suppose mostly it's always nice to know what people are working on.

About the reasons for my inquiry: mostly tho whole of the modern day PC as a Complex Instruction Set Computer has become so convolved that dealing with it's pipelines (instruction )pre-)fetch, memory access delay (depending on address- and data-flow access times memory page reuse, clock frequency matching between (frequency governed/turbo boosted) cores/threads and the memory access and DMA control units, etc.) the cache filling, write-through and hierarchical access times, and the contentions of threads/cores accessing memory and devices, is pretty hard, and looks almost non-determinable for software engineers.

Add to that the hardware and network related related multilevel buffering and the difficulty of executing kernel activities like memory segment and bank assignment, process/thread scheduling/check-pointing/virtual multiprocessing state preservation, it's hard to know what efficient programming, and gives reliable and repeatable interactions times unless you trim the number of processes, run a real-time kernel Linux (which of course aren't a scientifically RT finite state machine yet), us a machine with little variation in it's load or stay way under the full load of the CPU and memory such that they hardly rise in temperature as they would when you'd use a significant portion of it's actual processing power, or maybe resort to a simpler processor with simpler heat management and build you own OS+software from the ground up without a claim to general usefulness.

In practice task switching, which can be related to thread instructions, as well as memory management can be in the way of the fine grained real-time responsiveness you may want, and the many pipelines in the modern PC (say an I7 machine) as well as the many caches in combination with access granularity to the main memory can be very in the way of even deciding which small computation should follow the other and then efficiently execute a small number of computations.

An FPGA like the cheap put powerful Zynq 7010 I use can, when running at 1/3 of a GHz compute fast logical sequences very efficiently, and for instance theoretically can run certain filters at up to 10Gigaops per second, which isn't necessarily easy on a PC, but it then still can connect up signal parts with almost no buffering in between and very little pipe-lining. Of course in case you want to make good use of your virtual CPUs ALU or even FPU, you need to run more samples through it than simply one per clock cycle. For cases of straightforward logic resulting from optimized silicon compiling a C program with the latest Xilinx Vivado HLx, it is possible to run computations in 1 (one) clock cycle of the 333MHz FPGA fabric. That means at CD rate you should use 333,000kHz/44.1kHz ~ 7,551.0 samples per computation unit to make full use of that hardware instance's full abilities. I for myself regularly use a "jackd" (Linux/alsa) process framesize of 8192 as this makes the system very stable when a lot of computations are to be done by various pipelines of various cores, while running 192kHz audio, which is an empirical given, and not on an optimized machine (for instance it still doubles as a web server, TV, and I prefer to run Firefox as well).

Anyhow, it's an interesting subject which I as very advanced musician like when it becomes more accurate and responsive !

Theo V.

_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Reply via email to