[
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=403498&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-403498
]
ASF GitHub Bot logged work on BEAM-9468:
----------------------------------------
Author: ASF GitHub Bot
Created on: 14/Mar/20 20:05
Start Date: 14/Mar/20 20:05
Worklog Time Spent: 10m
Work Description: jaketf commented on pull request #11107: [BEAM-9468]
[WIP] add HL7v2IO and FhirIO
URL: https://github.com/apache/beam/pull/11107#discussion_r392615417
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java
##########
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.healthcare;
+
+import com.google.api.services.healthcare.v1alpha2.model.HttpBody;
+import com.google.auto.value.AutoValue;
+import java.io.IOException;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+
+/**
+ * {@link FhirIO} provides an API for writing resources to <a
+ * href="https://cloud.google.com/healthcare/docs/concepts/fhir">Google Cloud
Healthcare Fhir API.
+ * </a>
+ */
+public class FhirIO {
Review comment:
I do think it's feasible to use these APIs from Beam but I don't really
understand if they're the most appropriate for a Beam IO transform for a
transnational system. I admittedly do not understand all the use cases for
FhirIO so please chime in if the following understanding is missing something.
# Writing and Import
## Feasibility
We can model this similar to
[`BigqueryIO.Write::withCustomGcsTempLocation`](https://beam.apache.org/releases/javadoc/2.17.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withCustomGcsTempLocation-org.apache.beam.sdk.options.ValueProvider-)
This might have throughput benefits. However, IIUC this will not have
transnational guarantees.
## Concerns
From the
[docs](https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores/import):
>It is **primarily intended to load data into an empty FHIR store** that is
not being used by other clients
and
>The import process does not enforce referential integrity, regardless of
the disableReferentialIntegrity setting on the FHIR store. This allows the
import of resources with arbitrary interdependencies without considering
grouping or ordering, but if the input data contains invalid references or if
some resources fail to be imported, the FHIR store might be left in a state
that violates referential integrity.
IIUC, the import method should basically only be used on the output of an
export of the FHIR store. If you are doing any transformation, this would not
have been validated for the transnational guarantees of the FHIR spec and sort
of blindly imported.
## Thoughts
I feel because the FHIR store is transactional
[executeBundle](https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores.fhir/executeBundle)
is the appropriate / safe method to import data into the FHIR store and assure
that it is valid with the information already in the FHIR store. We can take
queue form precedence of other Beam IO Transforms for transnational systems
(e.g.
[SpannerIO.Write](https://beam.apache.org/releases/javadoc/2.17.0/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.Write.html)).
that perform a PCollection of mutations (our corallary is execute a
PCollection of bundles). Note this is what would make sense for the
prototypical HL7v2 -> FHIR mapping pipeline which is updating a "live" FHIR
store with other clients. Unless the use case is import everything from this
time we exported the FHIR store in history, in which case you should just use
the import API directly, there's no need for Beam.
# Reading and Export
## Feasibility
The export API starts a long running operation to export the full contents
of the FHIR store to GCS or BQ. It is doable to wait on this LRO in a DoFn (it
is sort of similar to
[`BigQueryIO.Read::fromQuery`](https://beam.apache.org/releases/javadoc/2.17.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Read.html#fromQuery-java.lang.String-)
which waits on a BQ Query job.
## Concerns
This seems to me like something that should be orchestrated outside of Beam
and when it completes start the beam job on the output (using `TextIO` or
`BigQueryIO`).
What is the use case for a "read everything from FHIR"?
## Thoughts
My intuition says there would be more use cases to read a subset of the FHIR
store in Beam pipelines with the
[read](https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores.fhir/read)
and
[search](https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores.fhir/search)
method would make more sense. This way the pipeline can process just a
certain resource (or results for some search query).
I could also see a use case for a realtime "tail my FHIR Store" which we
could set up on the [FHIR resource pubsub
notifications](https://cloud.google.com/healthcare/docs/how-tos/pubsub#fhir_resources)
similar to how I implemented the realtime/unbounded HL7v2 store Read.
Again, I'm not an expert on this healthcare problem space so please LMK if
I'm not understanding the FHIR spec we're programming against properly.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 403498)
Time Spent: 1h (was: 50m)
> Add Google Cloud Healthcare API IO Connectors
> ---------------------------------------------
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
> Issue Type: New Feature
> Components: io-java-gcp
> Reporter: Jacob Ferriero
> Priority: Minor
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOMĀ
--
This message was sent by Atlassian Jira
(v8.3.4#803005)