jaketf commented on pull request #11596:
URL: https://github.com/apache/beam/pull/11596#issuecomment-623766796


   @chamikaramj thanks for the suggestion. I will look into using BoundedSource 
API.
   
   Unfortunately, regular DoFns don't cut it because a single elements outputs 
are committed atomically (see this 
[conversation](https://github.com/apache/beam/pull/11538#discussion_r416927740)).
   Basically we have one input element (HL7v2 store) exploding to many, many 
output elements (all the messages in that store) in a single ProcessElement 
call. I'm trying to explore strategies for splitting up this listing.
   
   I originally chose splittable DoFn over BoundedSource based off the 
sentiment of this statement:
   > **Coding against the Source API involves a lot of boilerplate and is 
error-prone**, and it does not compose well with the rest of the Beam model 
because a Source can appear only at the root of a pipeline. - 
https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
   
   The blog also mentions 
   - A Source can not emit an additional output (for example, records that 
failed to parse).
       - Healthcare customers feeding requirements for this plugin want DLQ on 
all sinks and sources. To be consistent with the streaming API provided in 
`HL7v2IO.Read` I wanted to provide DLQ in `HLv2IO.ListMessages`. However, I 
believe this is more of a nice to have for batch use cases.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to