Hello,
Look again in the archives (not only in one post but my responses). Short
answer is: NO, for many reasons - all provided in my responses in that thread.
Best regards,
Tomasz Janeczko
amibroker.com
On 2010-08-22 22:16, TA wrote:
I checked the archives and the only thing that I see is that dloyer123 wrote a plugin using cuda and he saw significant improvement in his back testing. Have
you tested AB with Cuda? Any plans to implement such technologies in AB? TIA
*From:* [email protected] [mailto:[email protected]] *On Behalf
Of *Tomasz Janeczko
*Sent:* Sunday, August 22, 2010 1:03 PM
*To:* [email protected]
*Subject:* Re: [amibroker] OT: mutlicore cpu
Hello,
CUDA was discussed before, check the archives.
Best regards,
Tomasz Janeczko
amibroker.com
On 2010-08-22 21:52, TA wrote:
Thanks again. My last question. Have you tested AB to hand over some
calculations to GPU (e.g., using CUDA)? Since GPU has its own FPU and memory.
Sorry
for all these layman question! I am really fascinated by all these stuff.
TIA
*From:* [email protected] <mailto:[email protected]>
[mailto:[email protected]] *On Behalf Of *Tomasz Janeczko
*Sent:* Sunday, August 22, 2010 12:30 PM
*To:* [email protected] <mailto:[email protected]>
*Subject:* Re: [amibroker] OT: mutlicore cpu
Hello,
"why you bought an i7 (6 core) cpu?"
Several reasons - my previous computer was 4 year old and it was 2 core.
Buying anything less than i7 would not give me any visible gain. In, although
new
machine is faster, in everyday tasks it makes little difference (especially
with the fact that Windows and other softwares add bloat faster than hardware
evolves.) As to technical reasons - among other things - to do actual
tests. 4 years ago I have written portions of AFL engine using OpenMP (parallel
library) to test actual, real-world performance of parallel (multi-core)
code vs single-core on AMD Athlon64x2 (2core) Last year, I bought i7 to re-run
those tests on latest hardware. The conclusion is the same, fine-grain
parallelism (the one that OpenMP supports) with 3:1 memory to FPU ratio makes
no
sense performance-wise. You need much more FPU/CPU calcs per single memory
access to make it worthwhile.
With regards to buying new hardware: If you develop software, you need to
have several platforms to test on to ensure smooth operation on every popular
hardware. I am testing AmiBroker on everything starting from Intel Celeron
600MHz (10-year old notebook), AMD Athlon XP (single core), AMD Athlon64x2
(dual core), Intel Core 2 Duo (2 core), and ending with Intel i7 920
(6core).
Best regards,
Tomasz Janeczko
amibroker.com
On 2010-08-22 20:43, TA wrote:
Thanks for clarification. Would you mind sharing with us why you bought
an i7 (6 core) cpu? TIA
*From:* [email protected] <mailto:[email protected]>
[mailto:[email protected]] *On Behalf Of *Tomasz Janeczko
*Sent:* Sunday, August 22, 2010 11:24 AM
*To:* [email protected] <mailto:[email protected]>
*Subject:* Re: [amibroker] OT: mutlicore cpu
Hello,
Video creation software is completely different. They do a lot of math
*per pixel* (it means that lots of FPU operations are needed for single pixel),
for example many algorithms use 8x8 pixel blocks 64*(4 bytes per
pixel)= 256bytes and do complex transform such as cosine transform. It means
that
lots of FPU instructions are done on very small blocks of memory that
completely fit on Level 1 CPU cache and thus they are not able to saturate
memory bandwidth. They literally do dozens of FPU ops per single RAM
access. In fact they barely need to touch RAM at all. It is completely opposite
to how AFL works, where there is usually 3 times more memory accesses
(2 reads + one write) than FPU operations.
And yes I do know what processors are on the market. In fact I do have
i7 (6 core) (I am writing this post using it).
The only reason for multi threading in case of AFL would be not speed
but asynchronous/parallel execution (ability to run AFL that takes long time in
parallel
(without blocking) other AFLs).
Best regards,
Tomasz Janeczko
amibroker.com
On 2010-08-22 01:19, TA wrote:
TJ
On many occasions you have written that the reason that you have
not implement multicore usage in AB is that single symbol data saturates the
on-die cache and memory bandwidth. The tests that I have seen for
video creation shows the apps that take advantage of multicore processes finish
the tasks faster. Do you know why that is? Do you know why AMD &
Intel are creating six core cpus (soon 8 core cpus)?TIA