There's nothing obviously glaring here. I would definitely recommend using the built-in profiler to see where it's spending its time. There may be subtle type instabilities or some other non-obvious issue. You definitely ought to be able to get within striking distance of similar C code, which should be in the expected 4-10x slower than hand-coded assembly.
On Fri, Sep 11, 2015 at 5:10 PM, Corey Moncure <[email protected]> wrote: > https://github.com/cmoncure/crypto/blob/master/aes.jl > > In the process of learning Julia (and crypto) I implemented the Rijndael > block cipher and inverse cipher. I tried to write idiomatic yet concise > code, but the performance is not very desirable. On my machine (i5-2500k @ > 4.0 Ghz) the throughput is piddling, on the order of 10e6 bytes/sec, and > memory allocation is at 3056 bytes / block, which I have not been able to > cut down any further. > > Obviously I do not intend to compete with hand-tuned assembler routines > that heavily exploit SIMD and pre-computed tables, but my completely > unfounded gut feeling is that given the right input, Julia should be able > to approach within a factor of 4-10 without such optimizations. Currently > this routine is within a factor of 1000. > > Any Julia experts out there willing to take a peek at the code and offer > some tips for idiomatic (i.e. within the framework of Julia syntax and > style) optimizations? > > In the course of doing this I have run into several gripes with Julia, > particularly some of the built-in functions which are often confusing or > contradictory by virtue of the type declarations of certain methods (or > lack of needed ones). For instance, Julia does not support negative > indexing of arrays... so then why do so many functions on arrays take only > signed integer types for dimensions? To the noobie it seems like an > obvious choice to type data holding the calculation of matrix dimensions or > indices as unsigned integers, given that the language does not support > negative indexing. Yet this fails unexpectedly in many built-ins such as > sub(). > > >
