lukecwik commented on code in PR #24271:
URL: https://github.com/apache/beam/pull/24271#discussion_r1026956269


##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/JsonUtils.java:
##########
@@ -73,6 +78,77 @@ public String apply(Row input) {
     };
   }
 
+  public static Schema beamSchemaFromJsonSchema(String jsonSchemaStr) {
+    org.everit.json.schema.ObjectSchema jsonSchema = 
jsonSchemaFromString(jsonSchemaStr);
+    return beamSchemaFromJsonSchema(jsonSchema);
+  }
+
+  private static Schema 
beamSchemaFromJsonSchema(org.everit.json.schema.ObjectSchema jsonSchema) {
+    Schema.Builder beamSchemaBuilder = Schema.builder();
+    for (String propertyName : jsonSchema.getPropertySchemas().keySet()) {
+      org.everit.json.schema.Schema propertySchema =
+          jsonSchema.getPropertySchemas().get(propertyName);
+      if (propertySchema == null) {
+        throw new IllegalArgumentException("Unable to parse schema " + 
jsonSchema.toString());
+      }
+      if 
(propertySchema.getClass().equals(org.everit.json.schema.ObjectSchema.class)) {
+        beamSchemaBuilder =
+            beamSchemaBuilder.addField(

Review Comment:
   This seems redundant with the `} else {` clause



##########
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy:
##########
@@ -678,6 +680,8 @@ class BeamModulePlugin implements Plugin<Project> {
         joda_time                                   : 
"joda-time:joda-time:2.10.10",
         jsonassert                                  : 
"org.skyscreamer:jsonassert:1.5.0",
         jsr305                                      : 
"com.google.code.findbugs:jsr305:$jsr305_version",
+        json_org                                    : 
"org.json:json:${json_org_version}",

Review Comment:
   I believe this could be managed by the gcp libraries-bom



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/JsonUtils.java:
##########
@@ -73,6 +78,77 @@ public String apply(Row input) {
     };
   }
 
+  public static Schema beamSchemaFromJsonSchema(String jsonSchemaStr) {
+    org.everit.json.schema.ObjectSchema jsonSchema = 
jsonSchemaFromString(jsonSchemaStr);
+    return beamSchemaFromJsonSchema(jsonSchema);
+  }
+
+  private static Schema 
beamSchemaFromJsonSchema(org.everit.json.schema.ObjectSchema jsonSchema) {
+    Schema.Builder beamSchemaBuilder = Schema.builder();
+    for (String propertyName : jsonSchema.getPropertySchemas().keySet()) {
+      org.everit.json.schema.Schema propertySchema =
+          jsonSchema.getPropertySchemas().get(propertyName);
+      if (propertySchema == null) {
+        throw new IllegalArgumentException("Unable to parse schema " + 
jsonSchema.toString());
+      }
+      if 
(propertySchema.getClass().equals(org.everit.json.schema.ObjectSchema.class)) {

Review Comment:
   `if (propertySchema instanceof ObjectSchema)`?
   
   Is there a reason why you want to explicit equality?



##########
sdks/java/core/build.gradle:
##########
@@ -91,6 +91,8 @@ dependencies {
   shadow library.java.avro
   shadow library.java.snappy_java
   shadow library.java.joda_time
+  shadow library.java.json_org
+  shadow library.java.everit_json_schema

Review Comment:
   Should we make this feature require the user provide the library?
   
   Need to analyze how large the dependency tree is and how stable it is.



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/JsonUtils.java:
##########
@@ -73,6 +78,77 @@ public String apply(Row input) {
     };
   }
 
+  public static Schema beamSchemaFromJsonSchema(String jsonSchemaStr) {
+    org.everit.json.schema.ObjectSchema jsonSchema = 
jsonSchemaFromString(jsonSchemaStr);
+    return beamSchemaFromJsonSchema(jsonSchema);
+  }
+
+  private static Schema 
beamSchemaFromJsonSchema(org.everit.json.schema.ObjectSchema jsonSchema) {
+    Schema.Builder beamSchemaBuilder = Schema.builder();
+    for (String propertyName : jsonSchema.getPropertySchemas().keySet()) {
+      org.everit.json.schema.Schema propertySchema =
+          jsonSchema.getPropertySchemas().get(propertyName);
+      if (propertySchema == null) {
+        throw new IllegalArgumentException("Unable to parse schema " + 
jsonSchema.toString());
+      }
+      if 
(propertySchema.getClass().equals(org.everit.json.schema.ObjectSchema.class)) {
+        beamSchemaBuilder =
+            beamSchemaBuilder.addField(
+                Schema.Field.of(propertyName, 
beamTypeFromJsonSchemaType(propertySchema)));
+      } else if 
(propertySchema.getClass().equals(org.everit.json.schema.ArraySchema.class)) {
+        beamSchemaBuilder =
+            beamSchemaBuilder.addField(
+                Schema.Field.of(
+                    propertyName,
+                    Schema.FieldType.array(
+                        beamTypeFromJsonSchemaType(
+                            ((ArraySchema) 
propertySchema).getAllItemSchema()))));

Review Comment:
   What about arrays that are tuples?
   
[[getItemSchemas](https://www.javadoc.io/static/org.everit.json/org.everit.json.schema/1.1.1/org/everit/json/schema/ArraySchema.html#getItemSchemas--)
   
](https://json-schema.org/understanding-json-schema/reference/array.html#tuple-validation)
   
   Or arrays with additional items?
   
https://json-schema.org/understanding-json-schema/reference/array.html#additional-items
   
   Or additional validation like, min/max/unique?



##########
sdks/java/core/src/test/resources/schemas/json/nested_arrays_objects_json_schema.json:
##########
@@ -0,0 +1,33 @@
+{
+  "$id": "https://example.com/arrays.schema.json";,
+  "description": "A representation of a person, company, organization, or 
place",

Review Comment:
   ?
   Seems unrelated to what is contained here



##########
sdks/java/core/src/test/resources/schemas/json/nested_arrays_objects_json_schema.json:
##########
@@ -0,0 +1,33 @@
+{
+  "$id": "https://example.com/arrays.schema.json";,
+  "description": "A representation of a person, company, organization, or 
place",
+  "type": "object",
+  "properties": {
+    "fruits": {
+      "type": "array",
+      "items": {
+        "type": "string"
+      }

Review Comment:
   please expand on this to handle nestings like:
   object -> object -> array
   array -> array -> object
   
   What about refs with nested refs



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/JsonUtils.java:
##########
@@ -73,6 +78,77 @@ public String apply(Row input) {
     };
   }
 
+  public static Schema beamSchemaFromJsonSchema(String jsonSchemaStr) {
+    org.everit.json.schema.ObjectSchema jsonSchema = 
jsonSchemaFromString(jsonSchemaStr);
+    return beamSchemaFromJsonSchema(jsonSchema);
+  }
+
+  private static Schema 
beamSchemaFromJsonSchema(org.everit.json.schema.ObjectSchema jsonSchema) {
+    Schema.Builder beamSchemaBuilder = Schema.builder();
+    for (String propertyName : jsonSchema.getPropertySchemas().keySet()) {
+      org.everit.json.schema.Schema propertySchema =
+          jsonSchema.getPropertySchemas().get(propertyName);
+      if (propertySchema == null) {
+        throw new IllegalArgumentException("Unable to parse schema " + 
jsonSchema.toString());
+      }
+      if 
(propertySchema.getClass().equals(org.everit.json.schema.ObjectSchema.class)) {
+        beamSchemaBuilder =
+            beamSchemaBuilder.addField(
+                Schema.Field.of(propertyName, 
beamTypeFromJsonSchemaType(propertySchema)));
+      } else if 
(propertySchema.getClass().equals(org.everit.json.schema.ArraySchema.class)) {
+        beamSchemaBuilder =
+            beamSchemaBuilder.addField(
+                Schema.Field.of(
+                    propertyName,
+                    Schema.FieldType.array(
+                        beamTypeFromJsonSchemaType(

Review Comment:
   What if this is an array of arrays?



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/JsonUtils.java:
##########
@@ -73,6 +78,77 @@ public String apply(Row input) {
     };
   }
 
+  public static Schema beamSchemaFromJsonSchema(String jsonSchemaStr) {
+    org.everit.json.schema.ObjectSchema jsonSchema = 
jsonSchemaFromString(jsonSchemaStr);
+    return beamSchemaFromJsonSchema(jsonSchema);
+  }
+
+  private static Schema 
beamSchemaFromJsonSchema(org.everit.json.schema.ObjectSchema jsonSchema) {
+    Schema.Builder beamSchemaBuilder = Schema.builder();
+    for (String propertyName : jsonSchema.getPropertySchemas().keySet()) {
+      org.everit.json.schema.Schema propertySchema =
+          jsonSchema.getPropertySchemas().get(propertyName);
+      if (propertySchema == null) {
+        throw new IllegalArgumentException("Unable to parse schema " + 
jsonSchema.toString());
+      }
+      if 
(propertySchema.getClass().equals(org.everit.json.schema.ObjectSchema.class)) {
+        beamSchemaBuilder =
+            beamSchemaBuilder.addField(
+                Schema.Field.of(propertyName, 
beamTypeFromJsonSchemaType(propertySchema)));
+      } else if 
(propertySchema.getClass().equals(org.everit.json.schema.ArraySchema.class)) {
+        beamSchemaBuilder =
+            beamSchemaBuilder.addField(
+                Schema.Field.of(
+                    propertyName,
+                    Schema.FieldType.array(
+                        beamTypeFromJsonSchemaType(
+                            ((ArraySchema) 
propertySchema).getAllItemSchema()))));
+      } else {
+        try {
+          beamSchemaBuilder =
+              beamSchemaBuilder.addField(
+                  Schema.Field.of(propertyName, 
beamTypeFromJsonSchemaType(propertySchema)));
+        } catch (IllegalArgumentException e) {
+          throw new IllegalArgumentException("Unsupported field type in field 
" + propertyName, e);
+        }
+      }
+    }
+    return beamSchemaBuilder.build();
+  }
+
+  private static Schema.FieldType beamTypeFromJsonSchemaType(
+      org.everit.json.schema.Schema propertySchema) {
+    if 
(propertySchema.getClass().equals(org.everit.json.schema.ObjectSchema.class)) {
+      return Schema.FieldType.row(beamSchemaFromJsonSchema((ObjectSchema) 
propertySchema));
+    } else if 
(propertySchema.getClass().equals(org.everit.json.schema.BooleanSchema.class)) {
+      return Schema.FieldType.BOOLEAN;
+    } else if 
(propertySchema.getClass().equals(org.everit.json.schema.NumberSchema.class)) {
+      return ((NumberSchema) propertySchema).requiresInteger()
+          ? Schema.FieldType.INT64
+          : Schema.FieldType.DOUBLE;
+    }
+    if 
(propertySchema.getClass().equals(org.everit.json.schema.StringSchema.class)) {
+      return Schema.FieldType.STRING;
+    } else if 
(propertySchema.getClass().equals(org.everit.json.schema.ReferenceSchema.class))
 {
+      org.everit.json.schema.Schema sch = ((ReferenceSchema) 
propertySchema).getReferredSchema();
+      return beamTypeFromJsonSchemaType(sch);
+    } else {
+      throw new IllegalArgumentException(
+          "Unsupported schema type: " + propertySchema.getClass().toString());

Review Comment:
   ```suggestion
             "Unsupported schema type: " + propertySchema.getClass());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to