Looking at the update by Devon on the WIKI
page LargeDatasetsForParallelProgramTesting it looks really interesting.

On Case 0 my first attempt was to get rid of the boxing. It would require
not using / to combine, but I thought it would speed things even without
multiple cores. No luck yet. But it would be easy to break cfs into pieces
and run each piece against all of irp then put the pieces back together.
Sharing irp between cores as a mapped file saves memory and avoids passing
and having multiple copies of irp.

   f=: 4 : '+/&>(<"1 x)%&.>/<"1 */\"1 >:y'

   pvs=: cfs f irp

   cfs4=: 4 25 360$,cfs

   a0=:(0{cfs4) f irp
   a1=:(1{cfs4) f irp
   a2=:(2{cfs4) f irp
   a3=:(3{cfs4) f irp

   pvs4=:a0,a1,a2,a3

   pvs-:pvs4
1

a0 through a3 could be executed on four cores.

Case 1 confuses me. We are dealing with disk speeds which are orders of
magnitude slower than memory. If it is to write multiple files where are the
possible collisions? The only contention I see is the disk arm moving back
and forth between the files. At any rate, it would be difficult to time as
the job is not really done until all the data is written to the physical
disk. Closing the files probably doesn't wait for all the data to be written
before returning. Need the equivalent of a database commit to get good
timings. The only optimization I see is to begin another sort while the
results of the previous sort is being written. Two J sessions needed, but
one core could work quite well as the writes would take little processing
time allowing the core to switch to the second J session which is doing the
next sort.

Similar problems with Case 2. Since the operating systems do lazy writes,
the timing numbers could vary greatly if all of the files can be held in
physical memory, letting the system schedule writes to better optimize the
disk. I think the best way to get the best performance is to avoid writing
to more than one file at a time. Again, converting the data for the next
file to tab-delimited while the previous file is being written.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to