ahmedabu98 opened a new issue, #25749:
URL: https://github.com/apache/beam/issues/25749
### What needs to happen?
TL;DR: There are two main issues here. The first is that there is no way to
dynamically create beam.Row() elements to be used in a cross-language context.
The second is that beam.Row() doesn't have type inferencing and defaults to
`Any`, which is not understood by Java.
`StorageWriteToBigQuery` is a wrapper for a `SchemaAwareExternalTransform`
that takes beam.Row()'s. The following creates beam.Row() elements from dicts:
```
output = (p
| beam.Create([{"num": 1}, {"num": 2}])
| beam.Map(lambda el: beam.Row(**el))
| beam.io.StorageWriteToBigQuery(table=table))
```
But this gives me `java.lang.IllegalArgumentException: Unknown Coder URN
beam:coder:pickled_python:v1`. This is probably because of the `**` operator.
Is there another way of creating beam.Row()'s dynamically?
I tried setting the args directly:
```
output = (p
| beam.Create([{"num": 1}, {"num": 2}])
| beam.Map(lambda el: beam.Row(num=el['num'])) <----
| beam.io.StorageWriteToBigQuery(table=table))
```
But looks like there is no type inferencing and instead defaults to `Any`:
```
java.lang.IllegalArgumentException: Failed to decode Schema due to an error
decoding Field proto:
name: "num"
type {
nullable: true
logical_type {
urn: "beam:logical:pythonsdk_any:v1"
}
}
```
The following works fine (explicitly setting `int`):
```
output = (p
| beam.Create([{"num": 1}, {"num": 2}])
| beam.Map(lambda el: beam.Row(num=int(el['num']))) <----
| beam.io.StorageWriteToBigQuery(table=table))
```
### Issue Priority
Priority: 2 (default / most normal work should be filed as P2)
### Issue Components
- [X] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]