TheNeuralBit commented on a change in pull request #16958:
URL: https://github.com/apache/beam/pull/16958#discussion_r818202098



##########
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionRowTuple.java
##########
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.values;
+
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.transforms.PTransform;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+import org.checkerframework.checker.nullness.qual.Nullable;
+
+/**
+ * A {@link PCollectionRowTuple} is an immutable tuple of {@link PCollection 
PCollection<Row>s},
+ * "keyed" by a string tag. A {@link PCollectionRowTuple} can be used as the 
input or output of a
+ * {@link PTransform} taking or producing multiple {@code PCollection<Row>} 
inputs or outputs.
+ *
+ * <p>A {@link PCollectionRowTuple} can be created and accessed like follows:
+ *
+ * <pre>{@code
+ * PCollection<Row> pc1 = ...;
+ * PCollection<Row> pc2 = ...;
+ *
+ * // Create tags for each of the PCollections to put in the PCollectionTuple:
+ * String tag1 = "pc1";
+ * String tag2 = "pc2";
+ * String tag3 = "pc3";
+ *
+ * // Create a PCollectionTuple with three PCollections:
+ * PCollectionTuple pcs = PCollectionTuple.of(tag1, pc1).and(tag2, 
pc2).and(tag3, pc3);
+ *
+ * // Create an empty PCollectionTuple:
+ * Pipeline p = ...;
+ * PCollectionTuple pcs2 = PCollectionTuple.empty(p);

Review comment:
       nit: looks like you need to `s/PCollectionTuple/PCollectionRowTuple/`
   
   It's too bad there's so much duplication from `PCollectionTuple`, but I 
can't think of a way to structure this that avoids it. Maybe @kennknowles 
   has an idea (but I know he dislikes inheritance, so maybe he prefers it this 
way :P)
   

##########
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.java
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import java.util.List;
+import java.util.Optional;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * Provider to create {@link SchemaTransform} instances.
+ *
+ * <p><b>Internal only:</b> This interface is actively being worked on and it 
will likely change as
+ * we provide implementations for more standard Beam transforms. We provide no 
backwards
+ * compatibility guarantees and it should not be implemented outside of the 
Beam repository.
+ */
+@Internal
+@Experimental(Kind.SCHEMAS)
+public interface SchemaTransformProvider {
+  /** Returns an id that uniquely represents this transform. */
+  String identifier();
+
+  /**
+   * Returns the expected schema of the configuration object. Note this is 
distinct from the schema
+   * of the transform itself.
+   *
+   * <p>Configurations should be forwards compatible. Fields added after this 
SchemaTransform is
+   * first released should always be nullable.
+   */
+  Schema configurationSchema();
+
+  /**
+   * Produce a SchemaTransform from transform-specific configuration object. 
Can throw a {@link
+   * InvalidConfigurationException} or a {@link InvalidSchemaException}.
+   */
+  SchemaTransform from(Row configuration);

Review comment:
       Do you think we could make this strongly typed, instead of using generic 
Rows? That was one shortcoming in the original SchemaIO interfaces imo. It 
would be nice to discourage the use of generic rows.
   
   ```suggestion
     SchemaTransform from(ConfigT configuration);
   ```
   
   This could accept a generic class `ConfigT`, and we could require that it be 
a type with an inferred schema.
   
   We'd have to re-think how `configurationSchema` works. Perhaps we'd instead 
require a method like `Class<ConfigT> getConfigClass()`, and we can look up 
that type in the Schema registry, when we're registering the 
SchemaTransformProvider implementations.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to