jaketf commented on a change in pull request #11596:
URL: https://github.com/apache/beam/pull/11596#discussion_r421807059



##########
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java
##########
@@ -472,24 +548,120 @@ public void initClient() throws IOException {
       this.client = new HttpHealthcareApiClient();
     }
 
+    @GetInitialRestriction
+    public OrderedTimeRange getEarliestToLatestRestriction(@Element String 
hl7v2Store)
+        throws IOException {
+      from = this.client.getEarliestHL7v2SendTime(hl7v2Store, this.filter);
+      // filters are [from, to) to match logic of OffsetRangeTracker but need 
latest element to be
+      // included in results set to add an extra ms to the upper bound.
+      to = this.client.getLatestHL7v2SendTime(hl7v2Store, this.filter).plus(1);
+      return new OrderedTimeRange(from, to);
+    }
+
+    @NewTracker
+    public OrderedTimeRangeTracker newTracker(@Restriction OrderedTimeRange 
timeRange) {
+      return timeRange.newTracker();
+    }
+
+    @SplitRestriction
+    public void split(
+        @Restriction OrderedTimeRange timeRange, 
OutputReceiver<OrderedTimeRange> out) {
+      // TODO(jaketf) How to pick optimal values for desiredNumOffsetsPerSplit 
?

Review comment:
       Unfortunately, in this use case dynamic splitting would be crucial 
because we can't know the distribution of data in the restriction dimension 
(sendTime). 
   
   If you imagine a hospital might be much busier during daytime / weekdays 
than night times weekends (though never dormant due to ICU and emergency 
services). "Day time" might change base on hospital location, week days are 
subject to holidays, etc.
   
   Data distribution in sendTime may be subject to significant spikes if one of 
the upstream systems populating sendTime has to backfill after a maintenance 
period and doesn't responsibly set this field to event time but sets all of the 
sendTimes to a short range of  backfill 
   processing time (this is sub optimal behavior of that system but sometimes a 
reality).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to