jubebo opened a new pull request, #27317: URL: https://github.com/apache/beam/pull/27317
This is what I have done - added recursive call to generate_user_type_from_bq_schema - This change implies that the object field_names_and_types passed to named_fields_to_schema may already contain a named tuple object instead of plain python types. Therefore: - adjusted input to named_fields_to_schema to prevent it from parsing all sequence elements which contain already processed elements - added new parameter to named_tuple_from_schema to overwrite certain positions in returned named tuple with already processed elements The current implementation results in errors upon execution as discussed [here](https://github.com/apache/beam/issues/27166#issuecomment-1600533589). Also, under the same link, I have formulated three overarching discussion points with this implementation that I would like to discuss: 1. Is the PTransform implementation already aware of 'nested' row entries? As far as I understand, the named tuple object returned by generate_user_type_from_bq_schema will be mapped onto the existing PCollection within the transformations expand method. 1. Is it desired that nested schemas will get registered multiple times in this setup? See https://github.com/apache/beam/blob/5e942ae3790bc95148413c43ab7e43a01a2d82ae/sdks/python/apache_beam/typehints/schemas.py#L533-L542 1. Can we drop the calls to named_fields_to_schema and named_tuple_from_schema in generate_user_type_from_bq_schema altogether and create the (nested) named tuple object manually? If so, we would of course lose the type checking functionalities build into these two methods. Looking forward to the discussion :) ------------------------ Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) ------------------------------------------------------------------------------------------------ [](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
