Just to close this off I did some more experiments and it is clear that using the "\"-based solving is the main performance enhancement (2-25 times faster for N & M in the 10:5000 range I've tested with!) while UniformScaling gives a rather small improvement (few percent) on top of that (expected since the latter scales only with N and not with M) and not always.
Even for N > M the "\"-based solving is faster, actually relatively more so than when N << M. As usual, Julia's performance is overall impressive with the optimized L0 EM algorithm being practical (run time less than 20 secs) even for M being 500,000 (if N is < 200 and k < 8). p-values below are from Wilcoxon test which is used to run as few repeated executions as possible in determining a speed difference. Cheers, Robert N = 500, M = 5000, k = 2 | Row # | Function | Avg | Reps | Relative | p | Signif slower? | | 1 | "solve " | "0.34sec" | 6 | 1.0 | 1.0 | "" | | 2 | "unisc & solve" | "0.36sec" | 6 | 1.07 | 0.026 | "*" | | 3 | "unisc " | "0.69sec" | 4 | 2.04 | 0.0095 | "**" | | 4 | "original " | "0.75sec" | 4 | 2.25 | 0.0095 | "**" | N = 5000, M = 500, k = 2 | Row # | Function | Avg | Reps | Relative | p | Signif slower? | | 1 | "unisc & solve" | "8.93sec" | 4 | 1.0 | 1.0 | "" | | 2 | "solve " | "9.11sec" | 4 | 1.02 | 0.029 | "*" | | 3 | "unisc " | "3.71mins" | 4 | 25.0 | 0.029 | "*" | | 4 | "original " | "3.71mins" | 4 | 25.0 | 0.029 | "*" | N = 5000, M = 5000, k = 2 | Row # | Function | Avg | Reps | Relative | p | Signif slower? | | 1 | "solve " | "38.13sec" | 15 | 1.0 | 1.0 | "" | | 2 | "unisc & solve" | "38.75sec" | 15 | 1.02 | 0.77 | "" | | 3 | "unisc " | "5.40mins" | 4 | 8.5 | 0.00052 | "***" | | 4 | "original " | "5.42mins" | 4 | 8.53 | 0.00052 | "***" | N = 200, M = 500000, k = 2 | Row # | Function | Avg | Reps | Relative | p | Signif slower? | | 1 | "unisc & solve " | "5.28 sec" | 5 | 1.0 | 1.0 | "" | | 2 | "solve " | "5.45 sec" | 5 | 1.03 | 0.016 | "*" | | 3 | "original " | "9.65 sec" | 4 | 1.83 | 0.016 | "*" | N = 200, M = 500000, k = 8 | Row # | Function | Avg | Reps | Relative | p | Signif slower? | | 1 | "unisc & solve " | "20.04 sec" | 15 | 1.0 | 1.0 | "" | | 2 | "solve " | "20.28 sec" | 15 | 1.01 | 0.93 | "" | | 3 | "original " | "47.91 sec" | 4 | 2.39 | 0.00052 | "***" | >