bsannicolas commented on code in PR #29849:
URL: https://github.com/apache/beam/pull/29849#discussion_r1441054926


##########
sdks/python/apache_beam/typehints/schemas.py:
##########
@@ -529,7 +541,7 @@ def typing_from_runner_api(
         return Any
       else:
         return LogicalType.from_runner_api(
-            fieldtype_proto.logical_type).language_type()
+            fieldtype_proto.logical_type)._language_type()

Review Comment:
   The problem I ran into is that the method for recovering the type 
information is via the LogicalTypeRegistry, which just contains a map from 
language type to LogicalType. This is a problem since multiple logical types 
may share the same language type:
   
   ```# TODO(yathu,BEAM-10722): Investigate and resolve conflicts in logical 
type
   # registration when more than one logical types sharing the same language 
type
   LogicalType.register_logical_type(DecimalLogicalType)
   ```
   
   Of course this also comes up with logical types with arguments, since (in at 
least most cases) they share the same language type despite differing by 
argument type. A value of Enum(1, A) could map to the enum of {1: A, 2: B} or 
the enum of {1: A, 2: C}.
   
   Other than making the language type a wrapper that contains the argument 
(e.g. Enum(1, A, {1: A, 2: B})), I'm not sure how else to recover it. You can't 
recover the other possible enum values from an enum value wrapper that 
represents only one of them and contains no other information about the other 
possible values.
   
   As far as I can tell, this is handled in the Java SDK with a more advanced 
schema model. The schema model mimics the protobuf model and contains logical 
type metadata. Instead of marking a field as the language type (e.g. an integer 
or enum value wrapper) as in the Python SDK, the Java SDK would mark the field 
as the logical type class implementation, even though it will still be 
manipulated by the user as the language type. The Python SDK is using concrete 
Python types (int, string, map, MyClass) as the schema model, whereas the Java 
SDK uses a custom FieldType model that defines the conversions from protobuf to 
Java classes and provides another layer of abstraction to handle logical types. 
The Python SDK hopes to recover the protobuf logical class definition from the 
Python language type, but this is lossy unless the types are defined in the 
clunky way that I've done for the Enum type, which is essentially to just 
include the type definition with every value.
   
   Let me know if I'm missing something here. Is the logical type metadata 
stored somewhere else?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to