Yes there is a reason not to parallelize. It makes profiling to determine where 
time is spent far harder. It makes debugging far harder as it introduces the 
possibility of interactions across threads. Coroutines has no concurrency, so 
there are no race conditions, no possible interactions. It's still ordinary 
code, not "multithreaded" code.

The coroutines library is the one I wrote that used to be part of daffodil-lib, 
but we removed it when we no longer needed it at the time.


________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Wednesday, September 16, 2020 10:15 AM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal

By "coroutines library", you're talking about the one on your gist that
you wrote?

https://gist.github.com/mbeckerle/312474bac9bee9102438c160890b6539

It would be nice if batching we're an option, at least so we could test
it and see if there is a difference.

Also, although we're not trying to go faster by overlapping, perhaps
this is something we might want to consider? Is there a reason to not
parallelize the SAX thread filling up queue and the unparse thread
reading from that queue? I guess if one thread is much faster than the
other then there's really not much benefit and one thread might just
spin waiting for the other to read/write and event? Does your coroutine
library do something to prevent this from happening?


On 9/16/20 10:00 AM, Beckerle, Mike wrote:
> The point of the coroutines library is that doing something "as simple as" 
> just an array blocking queue, etc. with threads is always problematic.
>
> Also, an important point. The objective here is "no parallelism". We're not 
> trying to go faster by overlapping things. We're just trying to change stacks 
> so we can run two different stack contexts.  Ideally this would all be a 
> single thread with stack switching. JVMs just don't have that.
>
> I think the coroutines library is pretty simple to use, and could be adapted 
> to batch up requests to reduce overhead if we want.
>
>
> ________________________________
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Wednesday, September 16, 2020 8:12 AM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
>
> As I recall, the libraries that use things like annotations end up
> changing the return types of all the callers, which ends up leaking in
> to he API and changing it, so I don't think any of those solutions will
> work.
>
> I think we have to use Threads, where the main thread is the caller
> using the SAX API, and when unparse is called we spawn off the actual
> unparse in a new thread. And there's some data structure shared between
> these threads that contains event information.
>
> I think it really just comes down to which of the various
> implementations to use. I'm not too familiar with Mike's Coroutine
> class. Mike, can you maybe discuss what advantages this has over say
> just spawning a thread and sharing something like an ArrayBlockingQueue
> to pass event information between the threads? This seems like the
> simplest option, and allows tuning the size of the queue, which should
> allow batching of events and minimize context switching between threads.
>
> - Steve
>
> On 9/15/20 10:15 PM, Olabusayo Kilo wrote:
>> I don't think we came to a conclusion on which path we should take. If I
>> understand correctly, our options seem to be between the Thread-based
>> Coroutine library (#3; which has a bit of overhead) and the
>> Continuations library (#2; which is not yet supported for 2.13 and
>> requires the suspendable annotation). I wanted to check in to see if
>> there was a preferred one that I could focus my effort on?
>>
>> On 4/24/20 9:28 AM, Beckerle, Mike wrote:
>>> A further thought on this. The overhead difference between
>>> continuations and threads was 1 to 4 (roughly).
>>>
>>> If you add real workload to what happens on either side of that
>>> producer-consumer relationship, I bet this difference disappears into
>>> the noise, not because it becomes more efficient due to less
>>> contention, but because it's such a tiny fraction of the actual work
>>> being done.
>>>
>>> The Thread-based coroutines library, I have a copy of in a separate
>>> sandbox, so if you want to grab that I'll get it over to you so you
>>> don't have to dig for it.
>>> ________________________________
>>> From: Beckerle, Mike <mbecke...@tresys.com>
>>> Sent: Friday, April 24, 2020 8:53 AM
>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
>>>
>>> That's really informative and confirms intuition that using threads
>>> really hurts performance when all you need is a stack switch.
>>>
>>> In this case reducing contention should reduce total work, but that
>>> depends on how carefully the queue is implemented. If it is a single
>>> lock it may not matter.
>>>
>>> We actually dont care about faster through parallelism because we
>>> should assume the machine is already saturated with work. We want to
>>> reduce total amount of work done.
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Steve Lawrence <slawre...@apache.org>
>>> Sent: Friday, April 24, 2020 8:02:37 AM
>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
>>>
>>> I decided to look at performance of three potential options to see if
>>> that would rule anything out. I looked at 1) coroutines 2) continuations
>>> 3) threads with BlockingQueue. For each of these, I modified the gist to
>>> remove printlns and use a different producer consumer model (which is
>>> actually very straightforward in we come across other alternatives to
>>> test). So everything is the same except for how the SAX content handler
>>> interacts with the custom InfosetInputter. For the performance numbers
>>> below, I created enough "events" in a loop so that rate of events
>>> remained roughly the same as I increased the number of events.
>>>
>>> 1) coroutines
>>>
>>> It turns out the coroutines library has a limitation where the
>>> yieldval() call must be directly inside the coroutine{} block. This is
>>> basically a non-starter for us, since the entire unparse call needs to
>>> be a coroutine, and the yieldval call happens way down the stack. So not
>>> only does this not have any active development, it functionally won't
>>> even work for us.
>>>
>>> 2) continuations
>>>
>>> 16.50 million events per second
>>>
>>> 3) thread with BlockingQueue
>>>
>>> I think this is similar to the Coroutine library you wrote for Daffodil
>>> (though it looks like it's been removed, we can probably find it in git
>>> the history if we want). This runs the unparse method in a thread and
>>> has a blocking queue that the producer pushes to and the consumer takes
>>> from. I tested with different queue sizes to see how that affects
>>> performance:
>>>
>>>    size  rate
>>>       1  0.14 million events per second
>>>      10  1.36 million events per second
>>>     100  3.18 million events per second
>>>    1000  3.16 million events per second
>>> 100000  3.09 million events per second
>>>
>>> So this BlockinQueue approach is quite a bit slower, and definitely
>>> requires batching events to be somewhat performant. I guess this
>>> slowness makes sens as this approach creates a thread for the unparse,
>>> has different threads blocking on this queue, and also creates a bunch
>>> of event objects to put in the queue (the continuation approach just
>>> mutates state so no extra objects are needed). It is possible that this
>>> isn't an accurate test since the producer is going crazy fast since I'm
>>> just incrementing a Long in each loop iteration. In the real world, the
>>> producer is going to be parsing XML or something, so won't be as fast.
>>> Perhaps if the producer was actually slower there would be less thread
>>> contention and actually allow for more parallel work?
>>>
>>>
>>> On 4/23/20 5:41 PM, Beckerle, Mike wrote:
>>>> I am pretty worried about the @suspendable annotation. The way this
>>>> shift/reset stuff works is it modifies the scala compiler to do
>>>> something called continuation passing style. aka CPS.
>>>>
>>>> I'd be ok if that was isolated to just a segment of the code. Maybe
>>>> there is some natural way to do that?
>>>>
>>>> But it seems to me that all code on the pathway from where a reset
>>>> block is entered to where a shift is called, all of it has to
>>>> propagate this @suspendable behavior and be compiled by way of this
>>>> CPS plug in. That looks ok for the tiny toy examples, but for a giant
>>>> code base like Daffodil runtime1 unparser, .... that seems fragile,
>>>> potentially has impact on debugging, memory allocation, and
>>>> performance of the code, and,... well given the lack of enthusiastic
>>>> support for shift/reset I think it is risky.
>>>>
>>>> The only other option I can think of is to spawn a separate thread,
>>>> allow true concurrency in a producer-consumer model.
>>>>
>>>> We already have a Coroutines library you may recall. We're not using
>>>> it in the code base now, and it's fairly high-overhead as it is a
>>>> depth 1 queue, so is constantly switching threads. It might have
>>>> better performance characteristics if the switching was reduced to
>>>> once every 100 events or similar. Streaming behavior does not have to
>>>> switch from events to pull at granularity 1 event per pull, it can be
>>>> much coarser than that to push overhead down.
>>>>
>>>> The limiting thing here really seems to be the JVM. Java virtual
>>>> machines simply don't support the concept of co-routines in any
>>>> sensible manner.
>>>>
>>>> There are also some coroutine-style libraries for Java that depend on
>>>> byte-code modification. I suspect those have a similar issue to the
>>>> CPS transformation, ie., all the code on the way to a suspension
>>>> requires the byte code modification, but I may be wrong.
>>>>
>>>> ________________________________
>>>> From: Steve Lawrence <slawre...@apache.org>
>>>> Sent: Thursday, April 23, 2020 11:21 AM
>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
>>>>
>>>> Thanks Mike! Continuations seems like a better alternative, at least
>>>> from a support point of view. Though, it's a little concerning that no
>>>> one is really stepping up to port it to 2.13, but I don't think we're in
>>>> any rush to get to 2.13. And I personally find the reset/shift concept a
>>>> bit harder to wrap my head around than the co-routine resume/yield, but
>>>> ultimately it's not too bad.
>>>>
>>>> To see how it would work with our DataProcessor/InfosetInputter, I
>>>> forked and updated your gist to include things like InfosetInputters,
>>>> DataProcessor, ContentHandler, etc. and added a bunch of println's and
>>>> comments to make sure things were behaving the way I thought they
>>>> should.
>>>>
>>>> https://gist.github.com/stevedlawrence/5e16081f4690448de6131af02daacea9
>>>>
>>>> I think it came out pretty straightforward. I also modified this so that
>>>> there isn't as much back and forth between hasNext/next like I have in
>>>> the current proposal. The only time we go back the to
>>>> ContentHandler/producer is when next() is called, and we only go back to
>>>> the InfosetInputter/consumer when a complete event is found, including
>>>> hasNext.
>>>>
>>>> I do have one concern with this approach. Scala required the
>>>> @suspendable annotation on the unparse() method of the DataProcessor and
>>>> on the next() method of the InfosetInputter for both the abstract class
>>>> and concrete SAX implementation. I'm not sure if that annotation causes
>>>> any problems when not used inside a reset block (i.e. old API style), or
>>>> if that annotation will end up cascading throughout the codebase. Seems
>>>> like there's a possibility for that to happen. Maybe I just need to
>>>> reorganize the code a bit, but it's not clear to me how.
>>>>
>>>>
>>>> On 4/22/20 7:18 PM, Beckerle, Mike wrote:
>>>>> scala continuations is supported on 2.11 and 2.12, but work in
>>>>> progress for 2.13. The main web page for it says it is looking for a
>>>>> lead developer and without that typesafe/lightbeam is doing bare
>>>>> minimum maintenance.
>>>>>
>>>>> A producer/consumer idiom like what we need is easily expressed
>>>>> using this shift/reset thing.
>>>>>
>>>>> Here's a gist that does a control turnaround from a handler to a
>>>>> pull-oriented while loop. Took me a bit of research to get the
>>>>> build.sbt right so this would "just work"
>>>>>
>>>>> https://gist.github.com/mbeckerle/4c1d8f8c365958ef7d01bf770fa6317c
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Beckerle, Mike <mbecke...@tresys.com>
>>>>> Sent: Wednesday, April 22, 2020 5:01 PM
>>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>>> Subject: Re: Daffodil SAX API Proposal
>>>>>
>>>>> Another possibility is scala-asynch which I think can do what we want.
>>>>> ________________________________
>>>>> From: Beckerle, Mike <mbecke...@tresys.com>
>>>>> Sent: Wednesday, April 22, 2020 4:34 PM
>>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>>> Subject: Re: Daffodil SAX API Proposal
>>>>>
>>>>> The alternative is probably scala.util.continuations aka "shift and
>>>>> reset".
>>>>>
>>>>> It's much harder to understand and use, but at least its in the
>>>>> standard library so is supported. (I think.)
>>>>>
>>>>> ________________________________
>>>>> From: Steve Lawrence <slawre...@apache.org>
>>>>> Sent: Wednesday, April 22, 2020 3:40 PM
>>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>>> Subject: Re: Daffodil SAX API Proposal
>>>>>
>>>>> I responded.
>>>>>
>>>>> I checked the license to make sure it's compatible (BSD-3), but I
>>>>> didn't
>>>>> actually check what versions of Scala it works with.
>>>>>
>>>>> Looks like it is only published for 2.11, and the repo hasn't been
>>>>> updated for at least 3 years. There is a 2.12.x branch in their repo,
>>>>> but it too hasn't been updated in a long time. We might have to see how
>>>>> much effort it would take to update that library, or perhaps find
>>>>> another library.
>>>>>
>>>>>
>>>>> On 4/22/20 3:28 PM, Beckerle, Mike wrote:
>>>>>> I reviewed this and added a comment about the only significant
>>>>>> issue, which I think just boils down to trying to keep the
>>>>>> coroutining back and forth as simple as possible.
>>>>>>
>>>>>> Another thought: Is the scala coroutines library supported in 2.11
>>>>>> and 2.12 (and 2.13 for being future-safe?)
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>> From: Steve Lawrence <slawre...@apache.org>
>>>>>> Sent: Wednesday, April 22, 2020 1:06 PM
>>>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>>>> Subject: Daffodil SAX API Proposal
>>>>>>
>>>>>> I've added a proposal to add a SAX API support to Daffodil.
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+SAX+API
>>>>>>
>>>>>>
>>>>>> Many libraries and applications already support SAX, so this should
>>>>>> provide a means for more seamless integration into different
>>>>>> toolsuites,
>>>>>> opening up the places where Daffodil could be easily integrated.
>>>>>>
>>>>>> SAX is also generally viewed as having a lower memory overhead, though
>>>>>> this does not attempt to solve the memory issues related to
>>>>>> Daffodil and
>>>>>> the internal infoset representation. This essentially just adds a SAX
>>>>>> compatible API around our existing API. Other changes are needed to
>>>>>> reduce our memory overhead and truly support a streaming model.
>>>>>>
>>>>>> - Steve
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Reply via email to