A while ago, Henry requested a comparison of interpreter speed. For this purpose I translated the Fannkuch task from the Benchmark game from Python to J as literally the same as possible (generators required a huge cludge of locale based OOP and goto. statements). Now that J has grown multi-threading capabilities, I completed the translation, as the python code did also do multi-threading (actually it seems to spin up multiple processes, rather than threads). The comparison looks like this (on my modest laptop): jpjacobs@icarus:~/Projects/J/BenchmarkGame$ time ~/j904/bin/jconsole fannkuch-redux_nothread.ijs 10 73196 Pfannkuchen(10) = 38
real 1m35,353s user 1m35,163s sys 0m0,173s jpjacobs@icarus:~/Projects/J/BenchmarkGame$ time ~/j904/bin/jconsole fannkuch-redux_notasks.ijs 10 73196 Pfannkuchen(10) = 38 real 1m30,151s user 1m30,008s sys 0m0,138s jpjacobs@icarus:~/Projects/J/BenchmarkGame$ time ~/j904/bin/jconsole fannkuch-redux.ijs 10 73196 Pfannkuchen(10) = 38 real 1m2,029s user 3m19,923s sys 0m24,551s jpjacobs@icarus:~/Projects/J/BenchmarkGame$ time python3.9 fannkuch-redux.py 10 73196 Pfannkuchen(10) = 38 real 0m6,753s user 0m26,127s sys 0m0,021s The "nothread" version does not use threads at all; the "notasks" version spins up the threads, but does not dispatch any tasks, in order to take a look at perhaps the j engine using threads without explicit user instruction. The version without underscores uses 4 tasks dispatched on 4 threads. Summarising, it seems from the real time (actual time used) that creating threads, but not using them slightly reduces the time taken to +-66%. Using the treads to execute tasks does reduce the time taken further, but by far not as low as 25%. This cannot be due to data communication overhead, since each task receives 3 numbers, and returns 2. I also noticed some inconsistent results in the parallel version, i.e. the program locking up, or yielding a checksum that is not correct. Python still beats J by an order of magnitude, and I suspect that it is due to not having iterators, but having to emulate them using locales and jumping around in the next__ functions with goto. . If anyone has a better way of writing this, I'm curious. I attached all 4 versions (3 J versions and 1 python version) of the task to this message (remove the .txt extension before use). Jan-Pieter
# The Computer Language Benchmarks Game # https://salsa.debian.org/benchmarksgame-team/benchmarksgame/ # # contributed by Joerg Baumann # many thanks to Oleg Mazurov for his helpful description from sys import argv from math import factorial from multiprocessing import cpu_count, Pool from itertools import islice, starmap def permutations(n, start, size): p = bytearray(range(n)) count = bytearray(n) remainder = start for v in range(n - 1, 0, -1): count[v], remainder = divmod(remainder, factorial(v)) for _ in range(count[v]): # never executed in single thread code, because at this point count[i] == \x00 p[:v], p[v] = p[1:v + 1], p[0] assert(count[1] == 0) assert(size < 2 or (size % 2 == 0)) if size < 2: yield p[:] else: rotation_swaps = [None] * n for i in range(1, n): r = list(range(n)) for v in range(1, i + 1): r[:v], r[v] = r[1:v + 1], r[0] swaps = [] for dst, src in enumerate(r): if dst != src: swaps.append((dst, src)) rotation_swaps[i] = tuple(swaps) while True: yield p[:] p[0], p[1] = p[1], p[0] yield p[:] i = 2 while count[i] >= i: count[i] = 0 i += 1 else: count[i] += 1 t = p[:] for dst, src in rotation_swaps[i]: p[dst] = t[src] def alternating_flips_generator(n, start, size): maximum_flips = 0 alternating_factor = 1 lv=0 for permutation in islice(permutations(n, start, size), size): first = permutation[0] if first: flips_count = 1 while True: permutation[:first + 1] = permutation[first::-1] first = permutation[0] if not first: break flips_count += 1 if maximum_flips < flips_count: maximum_flips = flips_count yield flips_count * alternating_factor else: yield 0 alternating_factor = -alternating_factor lv+=1 yield maximum_flips def task(n, start, size): alternating_flips = alternating_flips_generator(n, start, size) return sum(islice(alternating_flips, size)), next(alternating_flips) def fannkuch(n): if n < 0: for data in islice(permutations(-n, 0, factorial(-n)), factorial(-n)): print(''.join(map(lambda n: str(n + 1), data))) else: assert(n > 0) task_count = cpu_count() total = factorial(n) task_size = (total + task_count - 1) // task_count if task_size < 20000: task_size = total task_count = 1 assert(task_size % 2 == 0) task_args = [(n, i * task_size, task_size) for i in range(task_count)] if task_count > 1: with Pool() as pool: checksums, maximums = zip(*pool.starmap(task, task_args)) else: checksums, maximums = zip(*starmap(task, task_args)) checksum, maximum = sum(checksums), max(maximums) print("{0}\nPfannkuchen({1}) = {2}".format(checksum, n, maximum)) if __name__ == "__main__": fannkuch(int(argv[1]))
NB. description here: https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/fannkuchredux.html#fannkuchredux NB. input n, for permutation length NB. output checksum and 'fankuch(n) = result' NB. fkTacit=: {{ NB. cs=: 0 NB. checksum Not Yet Implemented NB. fap=: (i.@!@] (flipct@A.)"0 1 >:@i.) NB. for all permutations NB. NB. scalar attempt memoizing for avoiding repeated work NB. NB. works, but does not count iterations :/ NB. flipct=: [: 0&{:: [: flip^:(_) 0;] NB. flip=: (>:&.>@{. , ((|.@{. , }.)~ {.)&.>@{:)^:(1~:{.&>@{:) NB. cs;>./ fap y NB. }} NB. 'cs max'=: fkTacit ". n=: 2{:: ARGV Note 'Python conversions' - no bytearrays in J, so use char (if slow, try using ints) - how to implement generators? class with create, use while. #n=:next__c ) coclass'permutations' NB. verified, works (at least when starting from 0) divmod=: <.@% , |~ create=: {{ NB. takes y=n,start,stop 'nn start size'=: y p=: i. nn count=: 0#~ nn remainder=: start for_vv. i.@-&.<: nn do. 'd remainder'=: remainder divmod !vv count=: d vv}count NB. was oneliner in python NB. for loop inversing some items never seems to execute... in single thread because start is 0 for. i. vv{count do. p=: vv (({.}.),{.@], >:@[ }. ]) p end. end. assert. 0=1{count assert. (<&2 +. 0=2&|) size state=: 0 }} next=: {{ select. state NB. for emulating yields NB. case. 0 do. goto_zero. NB. implied case. 1 do. goto_one. case. 2 do. goto_two. end. if. size < 2 do. p return. end. NB. else unnecessary as the if returns. rotation_swaps=:nn#a: for_i. i.&.<: nn do. r =: i. nn for_vv. >:i. i do. r=: vv (({.}.) , {.@] , >:@[ }. ]) r end. swaps=:0 2$0 for_ds. r do. NB. enumerate is just using ds_index,ds in J if. ds~:ds_index do. swaps=:swaps,(ds_index,ds) end. end. rotation_swaps=: (<swaps) i} rotation_swaps end. NB. while loop from here label_while. state=:1 p return. NB. first yield after while True label_one. state=:2 p=: (0 1{p) (1 0)} p return. label_two. NB. second yield after while True. i=:2 NB. buh? while. i<:i{count do. count=: 0 i} count i=:>:i end. NB. while else?? count=: (>: i{count) i} count t=: p for_ds. >i{rotation_swaps do. NB. dst first, source last p=: (t{~{:ds) ({.ds)}p end. goto_while. }} destroy=:codestroy cocurrent'base' permutations_z_=: conew&'permutations' coclass'altflipsgen' destroy=:codestroy create=:{{ 'nn start size'=:y maximum_flips=:0 alternating_factor=:1 p=: permutations nn,start,size NB. for loop replaced by while because of contained yields, keep loopvar lv lv=: 0 }} next=:{{ while. lv<size do. NB. islice limits perm to size calls permutation=:next__p'' first=: {.permutation if. first do. flips_count=:1 while. 1 do. permutation=: (permutation |.@{.~ >:first) (i.>:first)} permutation first=: {.permutation if. first=0 do. break. end. NB. quirk of J not: -. 5 is "True" flips_count=:>:flips_count end. if. maximum_flips<flips_count do. maximum_flips=: flips_count end. lv=:>:lv ret=: flips_count * alternating_factor alternating_factor =: -alternating_factor ret return. else. lv=:>:lv alternating_factor =: -alternating_factor 0 return. end. end. destroy__p'' maximum_flips }} cocurrent'base' altflipsgen_z_=: conew&'altflipsgen' task=:{{ NB. y=n start size 'nn start size'=:y af=.altflipsgen nn,start,size NB. af returns as last iteration the max flips ret=. (next__af''),~ +/next__af^:(>:i.size)'' destroy__af'' ret }} fannkuch=:{{ NB. y is n NB. start 4 threads (== cpu count, ideally) (0 T. ''"_)^:] 4 nn=.y if. nn<0 do. p=: permutations (,0,!)@- nn for. i. !-nn do. NB. test for permutations data=: next__p'' echo ,":"0>: data end. destroy__p'' else. assert. nn>0 NB. no parallel impl ATM task_count=: 1 T. '' NB. would be handy if there were a cpucount option to T. total=: !nn task_size=: (total+task_count-1) <.@% task_count if. task_size < 20000 do. task_size=:total task_count=:1 end. assert. 0=2|task_size task_args =: nn ,. task_size,.~(task_size*i. task_count) if. task_count >1 do. NB. spin up parallel J tasks 'checksums maximums'=: |: > task t. ''"1 task_args else. 'checksums maximums'=: |: task "1 task_args end. 'checksum maximum'=: (+/checksums),>./maximums echo checksum echo 'Pfannkuchen(',(":nn),') = ',": maximum end. }} ([: exit@0: [: fannkuch@". 2&{::)^:('fannkuch-redux.ijs'-: 1&{::)^:(1<#) ARGV
NB. description here: https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/fannkuchredux.html#fannkuchredux NB. input n, for permutation length NB. output checksum and 'fankuch(n) = result' NB. fkTacit=: {{ NB. cs=: 0 NB. checksum Not Yet Implemented NB. fap=: (i.@!@] (flipct@A.)"0 1 >:@i.) NB. for all permutations NB. NB. scalar attempt memoizing for avoiding repeated work NB. NB. works, but does not count iterations :/ NB. flipct=: [: 0&{:: [: flip^:(_) 0;] NB. flip=: (>:&.>@{. , ((|.@{. , }.)~ {.)&.>@{:)^:(1~:{.&>@{:) NB. cs;>./ fap y NB. }} NB. 'cs max'=: fkTacit ". n=: 2{:: ARGV Note 'Python conversions' - no bytearrays in J, so use char (if slow, try using ints) - how to implement generators? class with create, use while. #n=:next__c ) coclass'permutations' NB. verified, works (at least when starting from 0) divmod=: <.@% , |~ create=: {{ NB. takes y=n,start,stop 'nn start size'=: y p=: i. nn count=: 0#~ nn remainder=: start for_vv. i.@-&.<: nn do. 'd remainder'=: remainder divmod !vv count=: d vv}count NB. was oneliner in python NB. for loop inversing some items never seems to execute... in single thread because start is 0 for. i. vv{count do. p=: vv (({.}.),{.@], >:@[ }. ]) p end. end. assert. 0=1{count assert. (<&2 +. 0=2&|) size state=: 0 }} next=: {{ select. state NB. for emulating yields NB. case. 0 do. goto_zero. NB. implied case. 1 do. goto_one. case. 2 do. goto_two. end. if. size < 2 do. p return. end. NB. else unnecessary as the if returns. rotation_swaps=:nn#a: for_i. i.&.<: nn do. r =: i. nn for_vv. >:i. i do. r=: vv (({.}.) , {.@] , >:@[ }. ]) r end. swaps=:0 2$0 for_ds. r do. NB. enumerate is just using ds_index,ds in J if. ds~:ds_index do. swaps=:swaps,(ds_index,ds) end. end. rotation_swaps=: (<swaps) i} rotation_swaps end. NB. while loop from here label_while. state=:1 p return. NB. first yield after while True label_one. state=:2 p=: (0 1{p) (1 0)} p return. label_two. NB. second yield after while True. i=:2 NB. buh? while. i<:i{count do. count=: 0 i} count i=:>:i end. NB. while else?? count=: (>: i{count) i} count t=: p for_ds. >i{rotation_swaps do. NB. dst first, source last p=: (t{~{:ds) ({.ds)}p end. goto_while. }} destroy=:codestroy cocurrent'base' permutations_z_=: conew&'permutations' coclass'altflipsgen' destroy=:codestroy create=:{{ 'nn start size'=:y maximum_flips=:0 alternating_factor=:1 p=: permutations nn,start,size NB. for loop replaced by while because of contained yields, keep loopvar lv lv=: 0 }} next=:{{ while. lv<size do. NB. islice limits perm to size calls permutation=:next__p'' first=: {.permutation if. first do. flips_count=:1 while. 1 do. permutation=: (permutation |.@{.~ >:first) (i.>:first)} permutation first=: {.permutation if. first=0 do. break. end. NB. quirk of J not: -. 5 is "True" flips_count=:>:flips_count end. if. maximum_flips<flips_count do. maximum_flips=: flips_count end. lv=:>:lv ret=: flips_count * alternating_factor alternating_factor =: -alternating_factor ret return. else. lv=:>:lv alternating_factor =: -alternating_factor 0 return. end. end. destroy__p'' maximum_flips }} cocurrent'base' altflipsgen_z_=: conew&'altflipsgen' task=:{{ NB. y=n start size 'nn start size'=:y af=.altflipsgen nn,start,size NB. af returns as last iteration the max flips ret=. (next__af''),~ +/next__af^:(>:i.size)'' destroy__af'' ret }} fannkuch=:{{ NB. y is n nn=.y if. nn<0 do. p=: permutations (,0,!)@- nn for. i. !-nn do. NB. test for permutations data=: next__p'' echo ,":"0>: data end. destroy__p'' else. assert. nn>0 NB. no parallel impl ATM task_count=: 1 total=: !nn task_size=: (total+task_count-1) <.@% task_count if. 1 do. NB. task_size < 20000 do. task_size=:total task_count=:1 end. assert. 0=2|task_size task_args =: nn ,. task_size,.~(task_size*i. task_count) if. task_count >1 do. NB. spin up parallel J tasks 'checksums maximums'=: |: > task t. ''"1 task_args else. 'checksums maximums'=: |: task "1 task_args end. 'checksum maximum'=: (+/checksums),>./maximums echo checksum echo 'Pfannkuchen(',(":nn),') = ',": maximum end. }} ([: exit@0: [: fannkuch@". 2&{::)^:('fannkuch-redux_nothread.ijs'-: 1&{::)^:(1<#) ARGV
NB. description here: https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/fannkuchredux.html#fannkuchredux NB. input n, for permutation length NB. output checksum and 'fankuch(n) = result' NB. fkTacit=: {{ NB. cs=: 0 NB. checksum Not Yet Implemented NB. fap=: (i.@!@] (flipct@A.)"0 1 >:@i.) NB. for all permutations NB. NB. scalar attempt memoizing for avoiding repeated work NB. NB. works, but does not count iterations :/ NB. flipct=: [: 0&{:: [: flip^:(_) 0;] NB. flip=: (>:&.>@{. , ((|.@{. , }.)~ {.)&.>@{:)^:(1~:{.&>@{:) NB. cs;>./ fap y NB. }} NB. 'cs max'=: fkTacit ". n=: 2{:: ARGV Note 'Python conversions' - no bytearrays in J, so use char (if slow, try using ints) - how to implement generators? class with create, use while. #n=:next__c ) coclass'permutations' NB. verified, works (at least when starting from 0) divmod=: <.@% , |~ create=: {{ NB. takes y=n,start,stop 'nn start size'=: y p=: i. nn count=: 0#~ nn remainder=: start for_vv. i.@-&.<: nn do. 'd remainder'=: remainder divmod !vv count=: d vv}count NB. was oneliner in python NB. for loop inversing some items never seems to execute... in single thread because start is 0 for. i. vv{count do. p=: vv (({.}.),{.@], >:@[ }. ]) p end. end. assert. 0=1{count assert. (<&2 +. 0=2&|) size state=: 0 }} next=: {{ select. state NB. for emulating yields NB. case. 0 do. goto_zero. NB. implied case. 1 do. goto_one. case. 2 do. goto_two. end. if. size < 2 do. p return. end. NB. else unnecessary as the if returns. rotation_swaps=:nn#a: for_i. i.&.<: nn do. r =: i. nn for_vv. >:i. i do. r=: vv (({.}.) , {.@] , >:@[ }. ]) r end. swaps=:0 2$0 for_ds. r do. NB. enumerate is just using ds_index,ds in J if. ds~:ds_index do. swaps=:swaps,(ds_index,ds) end. end. rotation_swaps=: (<swaps) i} rotation_swaps end. NB. while loop from here label_while. state=:1 p return. NB. first yield after while True label_one. state=:2 p=: (0 1{p) (1 0)} p return. label_two. NB. second yield after while True. i=:2 NB. buh? while. i<:i{count do. count=: 0 i} count i=:>:i end. NB. while else?? count=: (>: i{count) i} count t=: p for_ds. >i{rotation_swaps do. NB. dst first, source last p=: (t{~{:ds) ({.ds)}p end. goto_while. }} destroy=:codestroy cocurrent'base' permutations_z_=: conew&'permutations' coclass'altflipsgen' destroy=:codestroy create=:{{ 'nn start size'=:y maximum_flips=:0 alternating_factor=:1 p=: permutations nn,start,size NB. for loop replaced by while because of contained yields, keep loopvar lv lv=: 0 }} next=:{{ while. lv<size do. NB. islice limits perm to size calls permutation=:next__p'' first=: {.permutation if. first do. flips_count=:1 while. 1 do. permutation=: (permutation |.@{.~ >:first) (i.>:first)} permutation first=: {.permutation if. first=0 do. break. end. NB. quirk of J not: -. 5 is "True" flips_count=:>:flips_count end. if. maximum_flips<flips_count do. maximum_flips=: flips_count end. lv=:>:lv ret=: flips_count * alternating_factor alternating_factor =: -alternating_factor ret return. else. lv=:>:lv alternating_factor =: -alternating_factor 0 return. end. end. destroy__p'' maximum_flips }} cocurrent'base' altflipsgen_z_=: conew&'altflipsgen' task=:{{ NB. y=n start size 'nn start size'=:y af=.altflipsgen nn,start,size NB. af returns as last iteration the max flips ret=. (next__af''),~ +/next__af^:(>:i.size)'' destroy__af'' ret }} fannkuch=:{{ NB. y is n NB. start 4 threads (== cpu count, ideally) (0 T. ''"_)^:] 4 nn=.y if. nn<0 do. p=: permutations (,0,!)@- nn for. i. !-nn do. NB. test for permutations data=: next__p'' echo ,":"0>: data end. destroy__p'' else. assert. nn>0 NB. no parallel impl ATM task_count=: 1 T. '' NB. would be handy if there were a cpucount option to T. total=: !nn task_size=: (total+task_count-1) <.@% task_count if. 1 do. NB. task_size < 20000 do. task_size=:total task_count=:1 end. assert. 0=2|task_size task_args =: nn ,. task_size,.~(task_size*i. task_count) if. task_count >1 do. NB. spin up parallel J tasks 'checksums maximums'=: |: > task t. ''"1 task_args else. 'checksums maximums'=: |: task "1 task_args end. 'checksum maximum'=: (+/checksums),>./maximums echo checksum echo 'Pfannkuchen(',(":nn),') = ',": maximum end. }} ([: exit@0: [: fannkuch@". 2&{::)^:('fannkuch-redux_notasks.ijs'-: 1&{::)^:(1<#) ARGV
---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
