[ 
https://issues.apache.org/jira/browse/BEAM-3437?focusedWorklogId=87168&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87168
 ]

ASF GitHub Bot logged work on BEAM-3437:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Apr/18 17:15
            Start Date: 03/Apr/18 17:15
    Worklog Time Spent: 10m 
      Work Description: reuvenlax commented on a change in pull request #4964: 
[BEAM-3437] Introduce Schema class, and use it in BeamSQL
URL: https://github.com/apache/beam/pull/4964#discussion_r178898184
 
 

 ##########
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java
 ##########
 @@ -0,0 +1,378 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.collect.BiMap;
+import com.google.common.collect.HashBiMap;
+import com.google.common.collect.ImmutableList;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collector;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.RowCoder;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * {@link Schema} describes the fields in {@link Row}.
+ *
+ */
+@Experimental
+@AutoValue
+public abstract class Schema implements Serializable {
+  // A mapping between field names an indices.
+  private BiMap<String, Integer> fieldIndices = HashBiMap.create();
+  public abstract List<Field> getFields();
+
+  @AutoValue.Builder
+  abstract static class Builder {
+    abstract Builder setFields(List<Field> fields);
+    abstract Schema build();
+  }
+
+  public static Schema of(List<Field> fields) {
+    return Schema.fromFields(fields);
+  }
+
+  public static Schema of(Field ... fields) {
+    return Schema.of(Arrays.asList(fields));
+  }
+
+  @Override
+  public boolean equals(Object o) {
+    if (!(o instanceof Schema)) {
+      return false;
+    }
+    Schema other = (Schema) o;
+    return Objects.equals(fieldIndices, other.fieldIndices)
+        && Objects.equals(getFields(), other.getFields());
+  }
+
+  @Override
+  public int hashCode() {
+    return Objects.hash(fieldIndices, getFields());
+  }
+
+  /**
+   * An enumerated list of supported types.
+   */
+  public enum TypeName {
+    BYTE,    // One-byte signed integer.
+    INT16,   // two-byte signed integer.
+    INT32,   // four-byte signed integer.
+    INT64,   // eight-byte signed integer.
+    DECIMAL,  // Decimal integer
+    FLOAT,
+    DOUBLE,
+    STRING,  // String.
+    DATETIME, // Date and time.
+    BOOLEAN,  // Boolean.
+    ARRAY,
+    ROW;    // The field is itself a nested row.
+
+    public static final List<TypeName> NUMERIC_TYPES = ImmutableList.of(
+        BYTE, INT16, INT32, INT64, DECIMAL, FLOAT, DOUBLE);
+    public static final List<TypeName> STRING_TYPES = ImmutableList.of(STRING);
+    public static final List<TypeName> DATE_TYPES = ImmutableList.of(DATETIME);
+    public static final List<TypeName> CONTAINER_TYPES = 
ImmutableList.of(ARRAY);
+    public static final List<TypeName> COMPOSITE_TYPES = ImmutableList.of(ROW);
+
+    public boolean isNumericType() {
+      return NUMERIC_TYPES.contains(this);
+    }
+    public boolean isStringType() {
+      return STRING_TYPES.contains(this);
+    }
+    public boolean isDateType() {
+      return DATE_TYPES.contains(this);
+    }
+    public boolean isContainerType() {
+      return CONTAINER_TYPES.contains(this);
+    }
+    public boolean isCompositeType() {
+      return COMPOSITE_TYPES.contains(this);
+    }
+
+    /** Returns a {@link FieldTypeDescriptor} representing this primitive 
type. */
+    public FieldTypeDescriptor typeDescriptor() {
+      return FieldTypeDescriptor.of(this);
+    }
+  }
+
+  /**
+   * A descriptor of a single field type. This is a recursive descriptor, as 
nested types are
+   * allowed.
+   */
+  @AutoValue
+  public abstract static class FieldTypeDescriptor implements Serializable {
+    // Returns the type of this field.
+    public abstract TypeName getType();
+    // For container types (e.g. ARRAY), returns the type of the contained 
element.
+    @Nullable public abstract FieldTypeDescriptor getComponentType();
+    // For ROW types, returns the schema for the row.
+    @Nullable public abstract Schema getRowSchema();
+    /**
+     * Returns optional extra metadata.
+     */
+    @Nullable public abstract byte[] getMetadata();
+    abstract FieldTypeDescriptor.Builder toBuilder();
+    @AutoValue.Builder
+    abstract static class Builder {
+      abstract Builder setType(TypeName typeName);
+      abstract Builder setComponentType(@Nullable FieldTypeDescriptor 
componentType);
+      abstract Builder setRowSchema(@Nullable Schema rowSchema);
+      abstract Builder setMetadata(@Nullable byte[] metadata);
+      abstract FieldTypeDescriptor build();
+    }
+
+    /**
+     * Create a {@link FieldTypeDescriptor} for the given type.
+     */
+    public static FieldTypeDescriptor of(TypeName typeName) {
+      return new 
AutoValue_Schema_FieldTypeDescriptor.Builder().setType(typeName).build();
+    }
+
+    /**
+     * For container types, adds the type of the component element.
+     */
+    public FieldTypeDescriptor withComponentType(@Nullable FieldTypeDescriptor 
componentType) {
+      if (componentType != null) {
+        checkArgument(getType().isContainerType());
+      }
+      return toBuilder().setComponentType(componentType).build();
+    }
+
+    /**
+     * For ROW types, sets the schema of the row.
+     */
+    public FieldTypeDescriptor withRowSchema(@Nullable Schema rowSchema) {
+      if (rowSchema != null) {
+        checkArgument(getType().isCompositeType());
+      }
+      return toBuilder().setRowSchema(rowSchema).build();
+    }
+
+    /**
+     * Returns a copy of the descriptor with metadata sert set.
+     */
+    public FieldTypeDescriptor withMetadata(@Nullable byte[] metadata) {
+      return toBuilder().setMetadata(metadata).build();
+    }
+
+    @Override
+    public boolean equals(Object o) {
+      if (!(o instanceof FieldTypeDescriptor)) {
+        return false;
+      }
+      FieldTypeDescriptor other = (FieldTypeDescriptor) o;
+      return Objects.equals(getType(), other.getType())
+          && Objects.equals(getComponentType(), other.getComponentType())
+          && Objects.equals(getRowSchema(), other.getRowSchema())
+          && Arrays.equals(getMetadata(), other.getMetadata());
+
+    }
+
+    @Override
+    public int hashCode() {
+      return Arrays.deepHashCode(
+          new Object[] {getType(), getComponentType(), getRowSchema(), 
getMetadata()});
+    }
+  }
+
+
+  /**
+   * Field of a row. Contains the {@link FieldTypeDescriptor} along with 
associated metadata.
+   *
+   */
+  @AutoValue
+  public abstract static class Field implements Serializable {
+    /**
+     * Returns the field name.
+     */
+    public abstract String getName();
+
+    /**
+     * Returns the field's description.
+     */
+    public abstract String getDescription();
+
+    /**
+     * Returns the fields {@link FieldTypeDescriptor}.
+     */
+    public abstract FieldTypeDescriptor getTypeDescriptor();
+
+    /**
+     * Returns whether the field supports null values.
+     */
+    public abstract Boolean getNullable();
 
 Review comment:
   Yeah, because I plan on adding BigQuery bindings right after this, and that 
requires Nullable. I tried isNullable initially BTW, and for some reason could 
not get AutoValue to recognize it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 87168)
    Time Spent: 5h 50m  (was: 5h 40m)

> Support schema in PCollections
> ------------------------------
>
>                 Key: BEAM-3437
>                 URL: https://issues.apache.org/jira/browse/BEAM-3437
>             Project: Beam
>          Issue Type: Wish
>          Components: beam-model
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
>          Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> As discussed with some people in the team, it would be great to add schema 
> support in {{PCollections}}. It will allow us:
> 1. To expect some data type in {{PTransforms}}
> 2. Improve some runners with additional features (I'm thinking about Spark 
> runner with data frames for instance).
> A technical draft document has been created: 
> https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUmQ12pHGK0QIvXS1FOTgRc/edit?disco=AAAABhykQIs&ts=5a203b46&usp=comment_email_document
> I also started a PoC on a branch, I will update this Jira with a "discussion" 
> PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to