shardulm94 commented on a change in pull request #2496:
URL: https://github.com/apache/iceberg/pull/2496#discussion_r616290705
##########
File path: site/docs/spec.md
##########
@@ -929,6 +933,13 @@ This serialization scheme is for storing single values as
individual binary valu
| **`list`** | Not supported
|
| **`map`** | Not supported
|
+## Appendix E: Default value Semantics
+
+Iceberg supports default-value semantics for fields of container
+types (i.e., struct, list and map). Specifically, a field of
+type NestedField can have a default value that will be returned
Review comment:
`NestedField` is a Java specific class, can we describe this generally
within the spec instead?
##########
File path: site/docs/spec.md
##########
@@ -124,6 +126,8 @@ A **`list`** is a collection of values with some element
type. The element field
A **`map`** is a collection of key-value pairs with a key type and a value
type. Both the key field and value field each have an integer id that is unique
in the table schema. Map keys are required and map values can be either
optional or required. Both map keys and map values may be any type, including
nested types.
+For semantics of default value support for fields of Nested Types, see
Appendix E.
Review comment:
I feel that spec behavior of default values is worth explained directly
in the spec rather than as part of an appendix.
##########
File path:
api/src/test/java/org/apache/iceberg/types/TestNestedFieldDefaultValues.java
##########
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.types;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+
+public class TestNestedFieldDefaultValues {
+
+ @Test
+ public void testConstructors() {
+ int id = 1;
+ String fieldName = "fieldName";
+ Type fieldType = Types.IntegerType.get();
+ String doc = "field doc";
+ Integer defaultValue = 100;
+
+ // optional constructors
+ Assert.assertFalse(Types.NestedField.optional(id, fieldName,
fieldType).hasDefaultValue());
Review comment:
Nit: Statically importing `optional` and `required` will hugely improve
readability.
##########
File path: api/src/main/java/org/apache/iceberg/types/Types.java
##########
@@ -415,42 +415,74 @@ public int hashCode() {
public static class NestedField implements Serializable {
public static NestedField optional(int id, String name, Type type) {
- return new NestedField(true, id, name, type, null);
+ return new NestedField(true, id, name, type, null, null);
}
public static NestedField optional(int id, String name, Type type, String
doc) {
- return new NestedField(true, id, name, type, doc);
+ return new NestedField(true, id, name, type, null, doc);
+ }
+
+ public static NestedField optional(int id, String name, Type type, Object
defaultValue, String doc) {
+ return new NestedField(true, id, name, type, defaultValue, doc);
+ }
+
+ public static NestedField optionalWithDefault(int id, String name, Type
type, Object defaultValue) {
+ return new NestedField(true, id, name, type, defaultValue, null);
}
public static NestedField required(int id, String name, Type type) {
- return new NestedField(false, id, name, type, null);
+ return new NestedField(false, id, name, type, null, null);
}
public static NestedField required(int id, String name, Type type, String
doc) {
- return new NestedField(false, id, name, type, doc);
+ return new NestedField(false, id, name, type, null, doc);
+ }
+
+ public static NestedField required(int id, String name, Type type, Object
defaultValue, String doc) {
+ validateDefaultValueForRequiredField(defaultValue);
+ return new NestedField(false, id, name, type, defaultValue, doc);
+ }
+
+ public static NestedField requiredWithDefault(int id, String name, Type
type, Object defaultValue) {
+ validateDefaultValueForRequiredField(defaultValue);
+ return new NestedField(false, id, name, type, defaultValue, null);
}
public static NestedField of(int id, boolean isOptional, String name, Type
type) {
- return new NestedField(isOptional, id, name, type, null);
+ return new NestedField(isOptional, id, name, type, null, null);
}
public static NestedField of(int id, boolean isOptional, String name, Type
type, String doc) {
- return new NestedField(isOptional, id, name, type, doc);
+ return new NestedField(isOptional, id, name, type, null, doc);
+ }
+
+ public static NestedField of(int id, boolean isOptional, String name, Type
type, Object defaultValue, String doc) {
+ return new NestedField(isOptional, id, name, type, defaultValue, doc);
+ }
+
+ private static void validateDefaultValueForRequiredField(Object
defaultValue) {
+ Preconditions.checkNotNull(defaultValue, "defaultValue cannot be null
for a required field");
}
private final boolean isOptional;
private final int id;
private final String name;
private final Type type;
+ private final Object defaultValue;
Review comment:
Should `defaultValue` field be considered in `equals` and `hashCode`
methods? Given that `doc` is considered, I think we should.
##########
File path:
api/src/test/java/org/apache/iceberg/types/TestNestedFieldDefaultValues.java
##########
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.types;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+
+public class TestNestedFieldDefaultValues {
+
+ @Test
+ public void testConstructors() {
+ int id = 1;
+ String fieldName = "fieldName";
+ Type fieldType = Types.IntegerType.get();
+ String doc = "field doc";
+ Integer defaultValue = 100;
+
+ // optional constructors
+ Assert.assertFalse(Types.NestedField.optional(id, fieldName,
fieldType).hasDefaultValue());
+ Assert.assertFalse(Types.NestedField.optional(id, fieldName, fieldType,
doc).hasDefaultValue());
+ Types.NestedField nestedFieldWithDefault = Types.NestedField.optional(id,
fieldName, fieldType, defaultValue, doc);
+ Assert.assertTrue(nestedFieldWithDefault.hasDefaultValue());
+ Assert.assertEquals(defaultValue,
nestedFieldWithDefault.getDefaultValue());
+ nestedFieldWithDefault = Types.NestedField.optionalWithDefault(id,
fieldName, fieldType, defaultValue);
+ Assert.assertTrue(nestedFieldWithDefault.hasDefaultValue());
+ Assert.assertEquals(defaultValue,
nestedFieldWithDefault.getDefaultValue());
+
+ // required constructors
+ Assert.assertFalse(Types.NestedField.required(id, fieldName,
fieldType).hasDefaultValue());
+ Assert.assertFalse(Types.NestedField.required(id, fieldName, fieldType,
doc).hasDefaultValue());
+ nestedFieldWithDefault = Types.NestedField.required(id, fieldName,
fieldType, defaultValue, doc);
+ Assert.assertTrue(nestedFieldWithDefault.hasDefaultValue());
+ Assert.assertEquals(defaultValue,
nestedFieldWithDefault.getDefaultValue());
+ nestedFieldWithDefault = Types.NestedField.requiredWithDefault(id,
fieldName, fieldType, defaultValue);
+ Assert.assertTrue(nestedFieldWithDefault.hasDefaultValue());
+ Assert.assertEquals(defaultValue,
nestedFieldWithDefault.getDefaultValue());
+
+ // of constructors
+ Assert.assertFalse(Types.NestedField.of(id, true, fieldName,
fieldType).hasDefaultValue());
+ Assert.assertFalse(Types.NestedField.of(id, true, fieldName, fieldType,
doc).hasDefaultValue());
+ nestedFieldWithDefault = Types.NestedField.of(id, true, fieldName,
fieldType, defaultValue, doc);
+ Assert.assertTrue(nestedFieldWithDefault.hasDefaultValue());
+ Assert.assertEquals(defaultValue,
nestedFieldWithDefault.getDefaultValue());
+
+ // illegal case (required with null defaultValue
+ try {
Review comment:
Can we pull these illegal cases into their separate test methods? We can
use JUnit's
[`ExpectedException`](https://junit.org/junit4/javadoc/4.12/org/junit/rules/ExpectedException.html)
to ensure that the exception class and the message are what we expect.
##########
File path: api/src/main/java/org/apache/iceberg/types/Types.java
##########
@@ -415,42 +415,74 @@ public int hashCode() {
public static class NestedField implements Serializable {
public static NestedField optional(int id, String name, Type type) {
- return new NestedField(true, id, name, type, null);
+ return new NestedField(true, id, name, type, null, null);
}
public static NestedField optional(int id, String name, Type type, String
doc) {
- return new NestedField(true, id, name, type, doc);
+ return new NestedField(true, id, name, type, null, doc);
+ }
+
+ public static NestedField optional(int id, String name, Type type, Object
defaultValue, String doc) {
+ return new NestedField(true, id, name, type, defaultValue, doc);
+ }
+
+ public static NestedField optionalWithDefault(int id, String name, Type
type, Object defaultValue) {
+ return new NestedField(true, id, name, type, defaultValue, null);
}
public static NestedField required(int id, String name, Type type) {
- return new NestedField(false, id, name, type, null);
+ return new NestedField(false, id, name, type, null, null);
}
public static NestedField required(int id, String name, Type type, String
doc) {
- return new NestedField(false, id, name, type, doc);
+ return new NestedField(false, id, name, type, null, doc);
+ }
+
+ public static NestedField required(int id, String name, Type type, Object
defaultValue, String doc) {
+ validateDefaultValueForRequiredField(defaultValue);
+ return new NestedField(false, id, name, type, defaultValue, doc);
+ }
+
+ public static NestedField requiredWithDefault(int id, String name, Type
type, Object defaultValue) {
+ validateDefaultValueForRequiredField(defaultValue);
+ return new NestedField(false, id, name, type, defaultValue, null);
}
public static NestedField of(int id, boolean isOptional, String name, Type
type) {
- return new NestedField(isOptional, id, name, type, null);
+ return new NestedField(isOptional, id, name, type, null, null);
}
public static NestedField of(int id, boolean isOptional, String name, Type
type, String doc) {
- return new NestedField(isOptional, id, name, type, doc);
+ return new NestedField(isOptional, id, name, type, null, doc);
+ }
+
+ public static NestedField of(int id, boolean isOptional, String name, Type
type, Object defaultValue, String doc) {
+ return new NestedField(isOptional, id, name, type, defaultValue, doc);
+ }
+
+ private static void validateDefaultValueForRequiredField(Object
defaultValue) {
+ Preconditions.checkNotNull(defaultValue, "defaultValue cannot be null
for a required field");
}
private final boolean isOptional;
private final int id;
private final String name;
private final Type type;
+ private final Object defaultValue;
private final String doc;
- private NestedField(boolean isOptional, int id, String name, Type type,
String doc) {
+ private NestedField(boolean isOptional, int id, String name, Type type,
Object defaultValue, String doc) {
Preconditions.checkNotNull(name, "Name cannot be null");
Preconditions.checkNotNull(type, "Type cannot be null");
+ if (defaultValue != null) {
+ Preconditions.checkArgument(defaultValue.getClass() ==
type.typeId().javaClass(),
Review comment:
I don't think equality checks on `type.typeId().javaClass()` would be
correct. E.g. for strings `type.typeId().javaClass()` return the interface
`CharSequence`. Fixed and binary return abstract class `ByteBuffer`. I think we
should use
[`Class#isInstanceOf(obj)`](https://docs.oracle.com/javase/8/docs/api/java/lang/Class.html#isInstance-java.lang.Object-).
##########
File path: api/src/main/java/org/apache/iceberg/types/Types.java
##########
@@ -415,42 +415,74 @@ public int hashCode() {
public static class NestedField implements Serializable {
public static NestedField optional(int id, String name, Type type) {
- return new NestedField(true, id, name, type, null);
+ return new NestedField(true, id, name, type, null, null);
}
public static NestedField optional(int id, String name, Type type, String
doc) {
- return new NestedField(true, id, name, type, doc);
+ return new NestedField(true, id, name, type, null, doc);
+ }
+
+ public static NestedField optional(int id, String name, Type type, Object
defaultValue, String doc) {
+ return new NestedField(true, id, name, type, defaultValue, doc);
+ }
+
+ public static NestedField optionalWithDefault(int id, String name, Type
type, Object defaultValue) {
+ return new NestedField(true, id, name, type, defaultValue, null);
}
public static NestedField required(int id, String name, Type type) {
- return new NestedField(false, id, name, type, null);
+ return new NestedField(false, id, name, type, null, null);
}
public static NestedField required(int id, String name, Type type, String
doc) {
- return new NestedField(false, id, name, type, doc);
+ return new NestedField(false, id, name, type, null, doc);
+ }
+
+ public static NestedField required(int id, String name, Type type, Object
defaultValue, String doc) {
+ validateDefaultValueForRequiredField(defaultValue);
+ return new NestedField(false, id, name, type, defaultValue, doc);
+ }
+
+ public static NestedField requiredWithDefault(int id, String name, Type
type, Object defaultValue) {
+ validateDefaultValueForRequiredField(defaultValue);
+ return new NestedField(false, id, name, type, defaultValue, null);
}
public static NestedField of(int id, boolean isOptional, String name, Type
type) {
- return new NestedField(isOptional, id, name, type, null);
+ return new NestedField(isOptional, id, name, type, null, null);
}
public static NestedField of(int id, boolean isOptional, String name, Type
type, String doc) {
- return new NestedField(isOptional, id, name, type, doc);
+ return new NestedField(isOptional, id, name, type, null, doc);
+ }
+
+ public static NestedField of(int id, boolean isOptional, String name, Type
type, Object defaultValue, String doc) {
+ return new NestedField(isOptional, id, name, type, defaultValue, doc);
+ }
+
+ private static void validateDefaultValueForRequiredField(Object
defaultValue) {
+ Preconditions.checkNotNull(defaultValue, "defaultValue cannot be null
for a required field");
}
private final boolean isOptional;
private final int id;
private final String name;
private final Type type;
+ private final Object defaultValue;
private final String doc;
- private NestedField(boolean isOptional, int id, String name, Type type,
String doc) {
+ private NestedField(boolean isOptional, int id, String name, Type type,
Object defaultValue, String doc) {
Preconditions.checkNotNull(name, "Name cannot be null");
Preconditions.checkNotNull(type, "Type cannot be null");
+ if (defaultValue != null) {
+ Preconditions.checkArgument(defaultValue.getClass() ==
type.typeId().javaClass(),
Review comment:
Do we support default values for fields with container types like
`struct`, `list`, `map`? Today `Type#javaClass` will return `Void` for such
types.
https://github.com/apache/iceberg/blob/6c6096d44cd1315c40e1e718c3186ba8927c3219/api/src/main/java/org/apache/iceberg/types/Type.java#L43-L45
We may need to think about the implications of changing those to a non-void
class. The class `Comparators` has a private method
[`internalClass`](https://github.com/apache/iceberg/blob/6c6096d44cd1315c40e1e718c3186ba8927c3219/api/src/main/java/org/apache/iceberg/types/Comparators.java#L88)
handled the java classes for container types. Perhaps we could also do
something similar.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]