claudevdm commented on issue #35738: URL: https://github.com/apache/beam/issues/35738#issuecomment-3238089987
Discussed with @shunping and I think the issue has to do with implicit behavior in beam <=2.64.0 where dill is the default pickling library and save_main_session=True. I think the issue is 1. JdbcDateType is registered in jdbc.py in the *submission* environment https://github.com/apache/beam/blob/372b25b75718376040a2a4eadfbf719022979940/sdks/python/apache_beam/io/jdbc.py#L405 2. Registering a logical type adds it to the LogicalType._known_logical_types class level variable in the *submission* environment 3. In the *execution* environment (on the worker) when shemas.py is imported, LogicalType._known_logical_types is reset, and only the logical types from schemas.py are registered 4. Nowhere do we actually register the JdbcDateType in the *execution* environment pickle_library=dill + save_main_session reloads the state of the __main__ session. And if jdbc.py was imported in the main session then JdbcDateType will be registered in the execution environment. This is kind of a hack and I think registering logical types outside of schemas.py is not really supported? To fix this I think we need to register these types in schemas.py, or somewhere else that is guaranteed to be executed in the execution environment. @Abacn @shunping WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
