claudevdm commented on issue #35738:
URL: https://github.com/apache/beam/issues/35738#issuecomment-3238089987

   Discussed with @shunping and I think the issue has to do with implicit 
behavior in beam <=2.64.0 where dill is the default pickling library and 
save_main_session=True.
   
   I think the issue is
   
   1. JdbcDateType is registered in jdbc.py in the *submission* environment 
https://github.com/apache/beam/blob/372b25b75718376040a2a4eadfbf719022979940/sdks/python/apache_beam/io/jdbc.py#L405
   2. Registering a logical type adds it to the 
LogicalType._known_logical_types class level variable in the *submission* 
environment
   3. In the *execution* environment (on the worker) when shemas.py is 
imported, LogicalType._known_logical_types is reset, and only the logical types 
from schemas.py are registered
   4. Nowhere do we actually register the JdbcDateType in the *execution* 
environment
   
   pickle_library=dill + save_main_session reloads the state of the __main__ 
session. And if jdbc.py was imported in the main session then JdbcDateType will 
be registered in the execution environment. This is kind of a hack and I think 
registering logical types outside of schemas.py is not really supported?
   
   To fix this I think we need to register these types in schemas.py, or 
somewhere else that is guaranteed to be executed in the execution environment.
   
   @Abacn @shunping  WDYT?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to