[GitHub] [beam] jaketf commented on a change in pull request #11538: [BEAM-9831] Improve performance and UX for HL7v2IO

GitBox Tue, 28 Apr 2020 17:19:22 -0700


jaketf commented on a change in pull request #11538:
URL: https://github.com/apache/beam/pull/11538#discussion_r417000696




##########
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java
##########
@@ -437,6 +444,20 @@ private Message fetchMessage(HealthcareApiClient client, 
String msgId)
           .apply(Create.of(this.hl7v2Stores))
           .apply(ParDo.of(new ListHL7v2MessagesFn(this.filter)))
           .setCoder(new HL7v2MessageCoder())
+          // Listing takes a long time for each input element (HL7v2 store) 
because it has to
+          // paginate through results in a single thread / ProcessElement call 
in order to keep
+          // track of page token.
+          // Eagerly emit data on 1 second intervals so downstream processing 
can get started before
+          // all of the list results have been paginated through.

Review comment:
       @pabloem does this mean that all of a single element's output must be 
buffered in memory? or will runner be smart enough to spill to disk?
   
   Based on my initial investigation I was not able to reproduce the behavior 
reported by customer in a unit test.
   summarized in this 
[gist](https://gist.github.com/jaketf/d3c2e70dde781bbb0ef1993446e34b71)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] jaketf commented on a change in pull request #11538: [BEAM-9831] Improve performance and UX for HL7v2IO

Reply via email to