SchemaTransformProvider | Java class naming convention

Damon Douglas via dev Tue, 15 Nov 2022 11:50:30 -0800

Hello Everyone,

Do we like the following Java class naming convention for
SchemaTransformProviders [1]?  The proposal is:


<IOName>(Read|Write)SchemaTransformProvider


*For those new to Beam, even if this is your first day, consider yourselves
a welcome contributor to this conversation.  Below are
definitions/references and a suggested learning guide to understand this
email.*

Explanation

The <IOName> identifies the Beam I/O [2] and Read or Write identifies a
read or write Ptransform, respectively.

For example, to implement a SchemaTransformProvider [1] for
BigQueryIO.Write[7], would look like:

BigQueryWriteSchemaTransformProvider


And to implement a SchemaTransformProvider for PubSubIO.Read[8] would like
like:

PubsubReadSchemaTransformProvider


Definitions/References

[1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
transforms using a language agnostic configuration.
SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
functions as the configuration of that SchemaProvider.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html

[2] *Beam I/O*: PTransform for reading from or writing to sources and sinks.
https://beam.apache.org/documentation/programming-guide/#pipeline-io

[3] *SchemaTransform*: An interface containing a buildTransform method that
returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html

[4] *Row*: A Beam Row is a generic element of data whose properties are
defined by a Schema[5].
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html

[5] *Schema*: A description of expected field names and their data types.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html

[6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single PInput
or POutput tagged by a String name.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html

[7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
BigQuery table.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html

[8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting
message payloads into a PCollection.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html

Suggested Learning/Reading to understand this email

1. https://beam.apache.org/documentation/programming-guide/#overview
2. https://beam.apache.org/documentation/programming-guide/#transforms (Up
to 4.1)
3. https://beam.apache.org/documentation/programming-guide/#pipeline-io
4. https://beam.apache.org/documentation/programming-guide/#schemas

SchemaTransformProvider | Java class naming convention

Reply via email to