rdblue commented on a change in pull request #4120:
URL: https://github.com/apache/iceberg/pull/4120#discussion_r806219718
##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java
##########
@@ -357,6 +361,26 @@ static Schema copyRecord(Schema record, List<Schema.Field>
newFields, String new
return copy;
}
+ static Schema copyArray(Schema array, Schema elementSchema) {
+ Preconditions.checkArgument(array.getType() ==
org.apache.avro.Schema.Type.ARRAY,
+ "Cannot invoke copyArray on non array schema");
+ Schema copy = Schema.createArray(elementSchema);
+ for (Map.Entry<String, Object> prop : array.getObjectProps().entrySet()) {
+ copy.addProp(prop.getKey(), prop.getValue());
+ }
+ return copy;
+ }
+
+ static Schema copyMap(Schema map, Schema valueSchema) {
+ Preconditions.checkArgument(map.getType() ==
org.apache.avro.Schema.Type.MAP,
+ "Cannot invoke copyMap on non map schema");
Review comment:
I think this should include the problem schema for debugging.
##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java
##########
@@ -90,6 +90,10 @@ static boolean hasIds(Schema schema) {
return AvroCustomOrderSchemaVisitor.visit(schema, new HasIds());
}
+ static boolean missingIds(Schema schema) {
Review comment:
This needs Javadoc.
##########
File path: core/src/main/java/org/apache/iceberg/avro/BuildAvroProjection.java
##########
@@ -49,6 +49,11 @@
this.current = expectedSchema.asStruct();
}
+ BuildAvroProjection(org.apache.iceberg.types.Type expectedType, Map<String,
String> renames) {
Review comment:
Is this for testing? What guarantees that the expected type is correct?
##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java
##########
@@ -357,6 +361,26 @@ static Schema copyRecord(Schema record, List<Schema.Field>
newFields, String new
return copy;
}
+ static Schema copyArray(Schema array, Schema elementSchema) {
Review comment:
I don't think that `copy` is quite the correct verb here because this is
not actually copying the array. It is replacing the element. What about
`replaceElement` and `replaceValue`?
##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java
##########
@@ -90,6 +90,23 @@ static boolean hasIds(Schema schema) {
return AvroCustomOrderSchemaVisitor.visit(schema, new HasIds());
}
+ /**
+ * @param schema an Avro Schema
+ * @return true/false based on whether any of the nodes in the provided
schema is missing an
+ * ID property recognizable by Iceberg core API. To have an ID recognizable
by Iceberg core API:
+ * <ul>
+ * <li>a field node under struct (record) schema should have {@link
FIELD_ID_PROP} property
+ * <li>an element node under list (array) schema should have {@link
ELEMENT_ID_PROP} property
+ * <li>a pair of key and value node under map schema should have {@link
KEY_ID_PROP} and
+ * {@link VALUE_ID_PROP} respectively
+ * <li>a primitive node is not assigned any ID related properties
+ * </ul>
+ * @implNote see {@link MissingIds} for more details
Review comment:
This is an internal class, so it shouldn't be referenced in Javadoc.
Anything that is relevant should be noted here since this is the public facing
part.
##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java
##########
@@ -90,6 +90,23 @@ static boolean hasIds(Schema schema) {
return AvroCustomOrderSchemaVisitor.visit(schema, new HasIds());
}
+ /**
+ * @param schema an Avro Schema
Review comment:
Can you add a description like the other methods have?
##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java
##########
@@ -90,6 +90,23 @@ static boolean hasIds(Schema schema) {
return AvroCustomOrderSchemaVisitor.visit(schema, new HasIds());
}
+ /**
+ * @param schema an Avro Schema
+ * @return true/false based on whether any of the nodes in the provided
schema is missing an
+ * ID property recognizable by Iceberg core API. To have an ID recognizable
by Iceberg core API:
+ * <ul>
+ * <li>a field node under struct (record) schema should have {@link
FIELD_ID_PROP} property
+ * <li>an element node under list (array) schema should have {@link
ELEMENT_ID_PROP} property
+ * <li>a pair of key and value node under map schema should have {@link
KEY_ID_PROP} and
+ * {@link VALUE_ID_PROP} respectively
+ * <li>a primitive node is not assigned any ID related properties
+ * </ul>
Review comment:
This long content is better suited for the main description, rather than
in the return. `@return` should be a short summary, like "@return true if the
schema has at least one field ID property, false otherwise"
The details about what "field ID property" could be should be above.
##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java
##########
@@ -90,6 +90,25 @@ static boolean hasIds(Schema schema) {
return AvroCustomOrderSchemaVisitor.visit(schema, new HasIds());
}
+ /**
+ * Check if any of the nodes in a given avro schema is missing an ID
recognizable by Iceberg core API
Review comment:
This is the Iceberg core API, so it doesn't make sense to refer to it as
something else. You can just say "Check if the schema is missing any ID
properties." And below, you can clarify which IDs are considered.
##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java
##########
@@ -90,6 +90,25 @@ static boolean hasIds(Schema schema) {
return AvroCustomOrderSchemaVisitor.visit(schema, new HasIds());
}
+ /**
+ * Check if any of the nodes in a given avro schema is missing an ID
recognizable by Iceberg core API
+ *
Review comment:
To separate paragraphs in Javadoc, add `<p>` to this line. No need for
closing tags.
##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java
##########
@@ -90,6 +90,25 @@ static boolean hasIds(Schema schema) {
return AvroCustomOrderSchemaVisitor.visit(schema, new HasIds());
}
+ /**
+ * Check if any of the nodes in a given avro schema is missing an ID
recognizable by Iceberg core API
+ *
Review comment:
To separate paragraphs in Javadoc, add `<p>` to this line. No need for a
closing paragraph tag.
##########
File path: core/src/main/java/org/apache/iceberg/avro/MissingIds.java
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import java.util.function.Supplier;
+import org.apache.avro.Schema;
+
+/**
+ * Returns true once the first node is found with ID property missing. Reverse
of {@link HasIds}
+ * <p>
+ * Note: To use {@link AvroSchemaUtil#toIceberg(Schema)} on an avro schema,
the avro schema need to be either
+ * have IDs on every node or not have IDs at all. Invoke {@link
AvroSchemaUtil#hasIds(Schema)} only proves
+ * that the schema has at least one ID, and not sufficient condition for
invoking
+ * {@link AvroSchemaUtil#toIceberg(Schema)} on the schema.
+ */
+public class MissingIds extends AvroCustomOrderSchemaVisitor<Boolean, Boolean>
{
Review comment:
This can be package-private right?
##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java
##########
@@ -90,6 +90,25 @@ static boolean hasIds(Schema schema) {
return AvroCustomOrderSchemaVisitor.visit(schema, new HasIds());
}
+ /**
+ * Check if any of the nodes in a given avro schema is missing an ID
+ * <p>
+ * To have an ID recognizable by Iceberg core API:
Review comment:
This still references the "Iceberg core API"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]