Re: SchemaTransformProvider | Java class naming convention

Reuven Lax via dev Tue, 15 Nov 2022 13:38:44 -0800

Out of curiosity, several IOs (including PubSub) already do support
schemas. Are you planning on modifying those?


On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev <[email protected]>
wrote:

> Hello Everyone,
>
> Do we like the following Java class naming convention for
> SchemaTransformProviders [1]?  The proposal is:
>
> <IOName>(Read|Write)SchemaTransformProvider
>
>
> *For those new to Beam, even if this is your first day, consider
> yourselves a welcome contributor to this conversation.  Below are
> definitions/references and a suggested learning guide to understand this
> email.*
>
> Explanation
>
> The <IOName> identifies the Beam I/O [2] and Read or Write identifies a
> read or write Ptransform, respectively.
>
> For example, to implement a SchemaTransformProvider [1] for
> BigQueryIO.Write[7], would look like:
>
> BigQueryWriteSchemaTransformProvider
>
>
> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would like
> like:
>
> PubsubReadSchemaTransformProvider
>
>
> Definitions/References
>
> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
> transforms using a language agnostic configuration.
> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
> functions as the configuration of that SchemaProvider.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html
>
> [2] *Beam I/O*: PTransform for reading from or writing to sources and
> sinks.
> https://beam.apache.org/documentation/programming-guide/#pipeline-io
>
> [3] *SchemaTransform*: An interface containing a buildTransform method
> that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html
>
> [4] *Row*: A Beam Row is a generic element of data whose properties are
> defined by a Schema[5].
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html
>
> [5] *Schema*: A description of expected field names and their data types.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
>
> [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single
> PInput or POutput tagged by a String name.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html
>
> [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
> BigQuery table.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html
>
> [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting
> message payloads into a PCollection.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html
>
> Suggested Learning/Reading to understand this email
>
> 1. https://beam.apache.org/documentation/programming-guide/#overview
> 2. https://beam.apache.org/documentation/programming-guide/#transforms
> (Up to 4.1)
> 3. https://beam.apache.org/documentation/programming-guide/#pipeline-io
> 4. https://beam.apache.org/documentation/programming-guide/#schemas
>

Re: SchemaTransformProvider | Java class naming convention

Reply via email to