jaketf edited a comment on issue #11339: [BEAM-9468] Fhir io URL: https://github.com/apache/beam/pull/11339#issuecomment-614332334 @lastomato I added [GroupIntoBatches](https://beam.apache.org/releases/javadoc/2.19.0/org/apache/beam/sdk/transforms/GroupIntoBatches.html) in the FhirIO.Import path. The logic is: - buffer `HttpBody`'s to an iterable until we have 1000 of them (this threshold was chosen arbitrarily) - ImportFn updates the ndJson write channel with all 1000 resources - FinishBundle will flush the batch: write to file on GCS and trigger import job This is one way to mitigate the "import job per resource" concern but I'm open to other suggestions for achieving this. I need to verify if this will miss the last batch if it isn't full. The language in the docs is >Elements are buffered until there are batchSize elements buffered, at which point they are output to the output PCollection. Which sounds like if a batch never reaches batchSize it will not be output.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
