First let's welcome a newcomer: tools/asm/fpvm accepts PFPU assembler in more or less FPVM style, with symbolic names added as a bonus. It calls tools/asm/pfpu repeatedly to execute the code one instruction at a time on the M1.
This is slow but accurate. The one instruction at a time limit comes from the command line size limit in RTEMS. One could work around that, e.g., by making the pfpu command read from a file, but let's save the over-engineering for later. Second, I put all this to use to find out more about the modulo bug that's been bothering me. Turns out that the result of 10 % 2 on M1 is 2, not 0. I've translated the modulo algorithm into "fpvm" assembler: https://github.com/milkymist/milkymist/blob/master/tools/asm/mod.fpvm Run with: cd milkymist/tools/asm ./fpvm -d -x mod.fpvm This needs M1_* set up such that "pfpu" can telnet and run the pfpu commands. The complete transcript of the session is below. There are three types of lines: - # fmul opb, idiv -> bidiv # bidiv = 8 (0x41000000) Tracing of operation and result at the level of "fpvm". - ## r2=2 r3=0x40800000 fmul r2,r3 -> r2 ## 0x41000000 8 0x40800000 4 Debug output for tracing at the level of "pfpu". Note that the result line starts with R2 (in internal and float format). - result = 2 (0x40000000) This means that, at the end, the variable "result" has the final value 2, with the internal representation 0x40000000. The internal representation tells us that this is an exact two, not something rounded. Third, whole thing is still a bit fragile. If you get a "syntax error", the M1 probably crashed and "pfpu" got confused about the (lack of) output. It also seems necessary to first run some rendering job after booting before running pfpu commands. I could have sworn I once saw it worked without rendering first, but maybe I'm just imagining things. Fourth, how to fix this ? The problem is div = 5 (0x409ffffb) which really is 4.9999976 and then becomes idiv = 4 (0x40800000) - Werner The full mod.fpvm session: # opa = 10 # opb = 2 # onehalf = 0.5 # twohalf = 1.5 # quake opb -> y ## r2=2 quake r2 -> r2 ## 0x3f3759df 0.716215 # y = 0.716215 (0x3f3759df) # fmul y,y -> yy ## r2=0x3f3759df r3=0x3f3759df fmul r2,r3 -> r2 ## 0x3f03519c 0.512964 0x3f3759df 0.716215 # yy = 0.512964 (0x3f03519c) # fmul onehalf, opb -> hx ## r2=0.5 r3=2 fmul r2,r3 -> r2 ## 0x3f800000 1 0x40000000 2 # hx = 1 (0x3f800000) # fmul hx, yy -> hxyy ## r2=0x3f800000 r3=0x3f03519c fmul r2,r3 -> r2 ## 0x3f03519c 0.512964 0x3f03519c 0.512964 # hxyy = 0.512964 (0x3f03519c) # fsub twohalf, hxyy -> sub ## r2=1.5 r3=0x3f03519c fsub r2,r3 -> r2 ## 0x3f7cae64 0.987036 0x3f03519c 0.512964 # sub = 0.987036 (0x3f7cae64) # fmul sub, y -> y2 ## r2=0x3f7cae64 r3=0x3f3759df fmul r2,r3 -> r2 ## 0x3f34f95e 0.70693 0x3f3759df 0.716215 # y2 = 0.70693 (0x3f34f95e) # fmul y2,y2 -> yy ## r2=0x3f34f95e r3=0x3f34f95e fmul r2,r3 -> r2 ## 0x3effdf3e 0.49975 0x3f34f95e 0.70693 # yy = 0.49975 (0x3effdf3e) # fmul onehalf, opb -> hx ## r2=0.5 r3=2 fmul r2,r3 -> r2 ## 0x3f800000 1 0x40000000 2 # hx = 1 (0x3f800000) # fmul hx, yy -> hxyy ## r2=0x3f800000 r3=0x3effdf3e fmul r2,r3 -> r2 ## 0x3effdf3e 0.49975 0x3effdf3e 0.49975 # hxyy = 0.49975 (0x3effdf3e) # fsub twohalf, hxyy -> sub ## r2=1.5 r3=0x3effdf3e fsub r2,r3 -> r2 ## 0x3f800830 1.00025 0x3effdf3e 0.49975 # sub = 1.00025 (0x3f800830) # fmul sub, y2 -> invsqrt ## r2=0x3f800830 r3=0x3f34f95e fmul r2,r3 -> r2 ## 0x3f3504f1 0.707107 0x3f34f95e 0.70693 # invsqrt = 0.707107 (0x3f3504f1) # fmul invsqrt, invsqrt ->invsqrt2 ## r2=0x3f3504f1 r3=0x3f3504f1 fmul r2,r3 -> r2 ## 0x3efffff9 0.5 0x3f3504f1 0.707107 # invsqrt2 = 0.5 (0x3efffff9) # fmul invsqrt2, opa -> div ## r2=0x3efffff9 r3=10 fmul r2,r3 -> r2 ## 0x409ffffb 5 0x41200000 10 # div = 5 (0x409ffffb) # f2i div -> i ## r2=0x409ffffb f2i r2 -> r2 ## 0x00000004 5.60519e-45 # i = 5.60519e-45 (0x00000004) # i2f i -> idiv ## r2=0x00000004 i2f r2 -> r2 ## 0x40800000 4 # idiv = 4 (0x40800000) # fmul opb, idiv -> bidiv ## r2=2 r3=0x40800000 fmul r2,r3 -> r2 ## 0x41000000 8 0x40800000 4 # bidiv = 8 (0x41000000) # fsub opa, bidiv -> result ## r2=10 r3=0x41000000 fsub r2,r3 -> r2 ## 0x40000000 2 0x41000000 8 # result = 2 (0x40000000) bidiv = 8 (0x41000000) div = 5 (0x409ffffb) hx = 1 (0x3f800000) hxyy = 0.49975 (0x3effdf3e) i = 5.60519e-45 (0x00000004) idiv = 4 (0x40800000) invsqrt = 0.707107 (0x3f3504f1) invsqrt2 = 0.5 (0x3efffff9) onehalf = 0.5 (0.5) opa = 10 (10) opb = 2 (2) result = 2 (0x40000000) sub = 1.00025 (0x3f800830) twohalf = 1.5 (1.5) y = 0.716215 (0x3f3759df) y2 = 0.70693 (0x3f34f95e) yy = 0.49975 (0x3effdf3e) (end) _______________________________________________ http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org IRC: #milkymist@Freenode
