Re: core.async and performance

Ben Mabey Fri, 29 Nov 2013 20:03:35 -0800

On 11/29/13, 8:33 PM, Cedric Greevey wrote:

Have you checked for other sources of performance hits? Boxing, varlookups, and especially reflection.

As I said, I haven't done any optimization yet. :) I did check forreflection though and didn't see any.

I'd expect a reasonably optimized Clojure version to outperform aPython version by a very large factor -- 10x just for being JITted JVMbytecode instead of interpreted Python, times anotherhowever-many-cores-you-have for core.async keeping all your processorswarm vs. Python and its GIL limiting the Python version tosingle-threaded performance.

This task does not benefit from the multiplexing that core.asyncprovides, at least not in the case of a single simulation which has noclear logical partition that can be run in parallel. The primarybenefit that core.async is providing in this case is to escape fromcall-back hell.

If your Clojure version is 2.5x *slower* then it's probably capable ofa *hundredfold* speedup somewhere, which suggests reflection(typically a 10x penalty if happening heavily in inner loops) *and*another sizable performance degrader* are combining here. Unless,again, you're measuring mostly overhead and not real workload on theClojure side, but not on the Python side. Put a significant load intoeach goroutine in both versions and compare them then, see if thathelps the Clojure version much more than the Python one for some reason.

Yeah, I think a real life simulation may have different results thanthis micro-benchmark.

* The other degrader would need to multiply with, not just add to, thereflection, too. That suggests either blocking (reflection making thatworse by reflection in one thread/go holding up progress systemwidefor 10x as long as without reflection) or else excess/discarded work(10x penalty for reflection, times 10x as many calls as needed to getthe job done due to transaction retries, poor algo, or something,would get you a 100-fold slowdown -- but retries of swap! or dosyncshouldn't be a factor if you're eschewing those in favor of go blocksfor coordination...)

On Fri, Nov 29, 2013 at 10:13 PM, Ben Mabey <b...@benmabey.com<mailto:b...@benmabey.com>> wrote:


    On Fri Nov 29 17:04:59 2013, kandre wrote:

        Here is the gist: https://gist.github.com/anonymous/7713596
        Please not that there's no ordering of time for this simple
        example
        and there's only one event (timeout). This is not what I
        intend to use
        but it shows the problem.
        Simulating 10^5 steps this way takes ~1.5s

        Cheers
        Andreas

        On Saturday, 30 November 2013 09:31:08 UTC+10:30, kandre wrote:

            I think I can provide you with a little code snipped.
            I am talking about the very basic car example
            (driving->parking->driving). Running the sim using core.async
            takes about 1s for 10^5 steps whereas the simpy version
        takes less
            than 1s for 10^6 iterations on my vm.
            Cheers
            Andreas

            On Saturday, 30 November 2013 09:22:22 UTC+10:30, Ben
        Mabey wrote:

                On Fri Nov 29 14:13:16 2013, kandre wrote:
                > Thanks for all the replies. I accidentally left out the
                close! When I contrived the example. I am using
        core.async for
                a discrete event simulation system. There are hundreds
        of go
                blocks all doing little but putting a sequence of
        events onto

                a channel and one go block advancing taking these
        events and
                advancing the time similar to simpy.readthedocs.org/
        <http://simpy.readthedocs.org/>
                <http://simpy.readthedocs.org/>

                >
                > The basic one car example under the previous link
        executes
                about 10 times faster than the same example using
        core.a sync.
                >

                Hi Andreas,
                I've been using core.async for DES as well since I
        think the
                process-based approach is useful.  I could try doing
        the same
                simulation you're attempting to see how my approach
        compares
                speed-wise.  Are you talking about the car wash or the gas
                station
                simulation?  Posting a gist of what you have will be
        helpful
                so I can
                use the same parameters.

                -Ben




        --
        --
        You received this message because you are subscribed to the Google
        Groups "Clojure" group.
        To post to this group, send email to clojure@googlegroups.com
        <mailto:clojure@googlegroups.com>
        Note that posts from new members are moderated - please be patient
        with your first post.
        To unsubscribe from this group, send email to
        clojure+unsubscr...@googlegroups.com
        <mailto:clojure%2bunsubscr...@googlegroups.com>
        For more options, visit this group at
        http://groups.google.com/group/clojure?hl=en
        ---
        You received this message because you are subscribed to the Google
        Groups "Clojure" group.
        To unsubscribe from this group and stop receiving emails from
        it, send
        an email to clojure+unsubscr...@googlegroups.com
        <mailto:clojure%2bunsubscr...@googlegroups.com>.
        For more options, visit https://groups.google.com/groups/opt_out.


    I've verified your results and compared it with an implementation
    using my library.  My version runs 1.25x faster than yours and
    that is with an actual priority queue behind the scheduling for
    correct simulation/time semantics.  However, mine is still 2x
    slower than the simpy version.  Gist with benchmarks:

    https://gist.github.com/bmabey/7714431

    simpy is a mature library with lots of performance tweaking and I
    have done no optimizations so far.  My library is a thin wrapping
    around core.async with a few hooks into the internals and so I
    would expect that most of the time is being spent in core.async
    (again, I have done zero profiling to actually verify this).  So,
    it may be that core.async is slower than python generators for
    this particular use case.  I should say that this use case is odd
    in that our task is a serial one and so we don't get any benefit
    from having a threadpool to multiplex across (in fact the context
    switching may be harmful).

    In my case the current slower speeds are vastly outweighed by the
    benefits:
    * can run multiple simulations in parallel for sensitivity analysis
    * I plan on eventually targeting Clojurescript for visualization
    (right now an event stream from JVM is used)
    * ability to leverage CEP libraries for advanced stats
    * being integrated into my production systems via channels which
    does all the real decision making in the sims.
        This means I can do sensitivity analysis on different policies
    using actual production code.  A nice side benefit of this is that
    I get a free integration test. :)

    Having said all that I am still exploring the use of core.async
    for DES and have not yet replaced my event-based simulator.  I
    most likely will replace at least parts of my simulations that
    have a lot of nested call-backs that make things hard to reason
    about.


    -Ben

----You received this message because you are subscribed to the Google

    Groups "Clojure" group.
    To post to this group, send email to clojure@googlegroups.com
    <mailto:clojure@googlegroups.com>
    Note that posts from new members are moderated - please be patient
    with your first post.
    To unsubscribe from this group, send email to
    clojure+unsubscr...@googlegroups.com
    <mailto:clojure%2bunsubscr...@googlegroups.com>
    For more options, visit this group at
    http://groups.google.com/group/clojure?hl=en
    --- You received this message because you are subscribed to the
    Google Groups "Clojure" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to clojure+unsubscr...@googlegroups.com
    <mailto:clojure%2bunsubscr...@googlegroups.com>.
    For more options, visit https://groups.google.com/groups/opt_out.


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com

Note that posts from new members are moderated - please be patientwith your first post.

To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---

You received this message because you are subscribed to the GoogleGroups "Clojure" group.To unsubscribe from this group and stop receiving emails from it, sendan email to clojure+unsubscr...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

---You received this message because you are subscribed to the Google Groups "Clojure" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: core.async and performance

Reply via email to