zhoufek commented on code in PR #17477:
URL: https://github.com/apache/beam/pull/17477#discussion_r861812937
##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/StaticSchemaInference.java:
##########
@@ -88,21 +89,48 @@ enum MethodType {
*/
public static Schema schemaFromClass(
Class<?> clazz, FieldValueTypeSupplier fieldValueTypeSupplier) {
+ return schemaFromClass(clazz, fieldValueTypeSupplier, new HashMap<Class,
Schema>());
+ }
+
+ private static Schema schemaFromClass(
+ Class<?> clazz,
+ FieldValueTypeSupplier fieldValueTypeSupplier,
+ HashMap<Class, Schema> alreadyVisitedSchemas) {
+ if (alreadyVisitedSchemas.containsKey(clazz)) {
+ Schema existingSchema = alreadyVisitedSchemas.get(clazz);
+ if (existingSchema == null) {
+ throw new IllegalArgumentException(
+ "Cannot infer schema with a circular reference. Class: " +
clazz.getTypeName());
+ }
+ return existingSchema;
+ }
+ alreadyVisitedSchemas.put(clazz, null);
Schema.Builder builder = Schema.builder();
for (FieldValueTypeInformation type : fieldValueTypeSupplier.get(clazz)) {
- Schema.FieldType fieldType = fieldFromType(type.getType(),
fieldValueTypeSupplier);
+ Schema.FieldType fieldType =
+ fieldFromType(type.getType(), fieldValueTypeSupplier,
alreadyVisitedSchemas);
if (type.isNullable()) {
builder.addNullableField(type.getName(), fieldType);
} else {
builder.addField(type.getName(), fieldType);
}
}
- return builder.build();
+ alreadyVisitedSchemas.remove(clazz);
Review Comment:
Sorry, missed this. Calling `remove` is unnecessary. `put` and `replace`
will both replace an existing value with the new one.
##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/StaticSchemaInference.java:
##########
@@ -88,21 +89,48 @@ enum MethodType {
*/
public static Schema schemaFromClass(
Class<?> clazz, FieldValueTypeSupplier fieldValueTypeSupplier) {
+ return schemaFromClass(clazz, fieldValueTypeSupplier, new HashMap<Class,
Schema>());
+ }
+
+ private static Schema schemaFromClass(
+ Class<?> clazz,
+ FieldValueTypeSupplier fieldValueTypeSupplier,
+ HashMap<Class, Schema> alreadyVisitedSchemas) {
Review Comment:
Sorry, missed this previously: Inputs should accept the most generic type
possible to do what it needs to do. In this case `Map` is preferable, since the
input doesn't need to be a `HashMap`. Main concern, since this is an output
parameter, is if an `ImmutableMap` is passed, but that'll be caught in tests.
With that said, in line with my comment below about using `synchronized` to
make the method thread-safe, you could make the `Map` a static field and use
`synchronized` methods to keep it thread-safe. That way the computations are
preserved across multiple calls. However, I don't know the general preference
in Beam, so it might be good to get another opinion.
##########
sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoSchemaTranslator.java:
##########
@@ -156,6 +165,16 @@ static Schema getSchema(Class<? extends Message> clazz) {
}
static Schema getSchema(Descriptors.Descriptor descriptor) {
Review Comment:
Couple stylistic pointers on this:
1. The first line should start with a verb that fits into the sentence:
"This method X..." In this case, "This method returns..." This should just
cover the basic behavior of the class.
2. The comment on not being thread safe should be in its own paragraph.
There would be a blank link, and this new paragraph starts with `<p>`. IntelliJ
will probably add the `</p>`. Remove this if it does.
More importantly, the comment on it not being thread-safe should just be
something like, "This method is not thread safe." Javadoc comments should focus
on what behavior the method exhibits. Not being thread-safe is an example,
since it impacts how someone may use this method. However, the details of *why*
that behavior exists should not be included. As far as a user is concerned, the
reason it isn't thread-safe doesn't matter. The end behavior is the same.
> I get why it may not be multi-thread safe, but why are you saying it is ok
to leave it as is? I will say that this change doesn't seem to fail any
existing tests
Changing a public thread-safe method to not being thread-safe could
negatively impact users who've been using it from a multi-threaded context and
under the assumption that it is thread safe. Since this isn't public, there's
less risk.
With that said, I think all methods in Beam need to be thread-safe, but I
might be misremembering. That's easy enough to fix, though, by marking the
method as `synchronized`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]