[GitHub] [beam] jaketf commented on a change in pull request #11596: [BEAM-9856] Optimization/hl7v2 io list messages

GitBox Thu, 07 May 2020 15:21:08 -0700


jaketf commented on a change in pull request #11596:
URL: https://github.com/apache/beam/pull/11596#discussion_r421827964




##########
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java
##########
@@ -472,24 +548,120 @@ public void initClient() throws IOException {
       this.client = new HttpHealthcareApiClient();
     }
 
+    @GetInitialRestriction
+    public OrderedTimeRange getEarliestToLatestRestriction(@Element String 
hl7v2Store)
+        throws IOException {
+      from = this.client.getEarliestHL7v2SendTime(hl7v2Store, this.filter);
+      // filters are [from, to) to match logic of OffsetRangeTracker but need 
latest element to be
+      // included in results set to add an extra ms to the upper bound.
+      to = this.client.getLatestHL7v2SendTime(hl7v2Store, this.filter).plus(1);
+      return new OrderedTimeRange(from, to);
+    }
+
+    @NewTracker
+    public OrderedTimeRangeTracker newTracker(@Restriction OrderedTimeRange 
timeRange) {
+      return timeRange.newTracker();
+    }
+
+    @SplitRestriction
+    public void split(
+        @Restriction OrderedTimeRange timeRange, 
OutputReceiver<OrderedTimeRange> out) {
+      // TODO(jaketf) How to pick optimal values for desiredNumOffsetsPerSplit 
?

Review comment:
       Yeah I think the "spiky backfill" (many cases in a small sendTime) is a 
corner case of a hot split that would just be slow and users would have to 
accept that or take it up with their upstream system.
   
   splitting on messageType / sendFacility are probably more popular logical 
filters and feels like a hack for a corner case that might mess with 
performance under the "typical" distribution of data in sendTime.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] jaketf commented on a change in pull request #11596: [BEAM-9856] Optimization/hl7v2 io list messages

Reply via email to