TheNeuralBit commented on issue #23467:
URL: https://github.com/apache/beam/issues/23467#issuecomment-1269044871
Thanks Cham, I'm surprised we didn't already have this tracked somewhere :)
We also need this for `df.predict` (or `predict(df, ..)`).
There are a couple of challenges here that have been rattling around in my
head:
- For many ModelHandlers the input/output type is basically bag of numbers,
e.g. a Tensor with dimensions (X,Y). It's ambiguous how these should be mapped
to Beam schemas.
- It could be a schema with a single field of type `List[List[int64]]`
- Or perhaps one dimension correspond to the schema fields (e.g. X fields
of type `List[int64]`)
- This is particularly problematic for the `df.predict` case, since the
pandas type system doesn't support complex types.
- On the output side, we likely can't get a detailed, parameterized type to
map back to Beam schemas. That is, we may know that the model produces a
Tensor, but we don't know the dimensions, which one is the batch dimension,
etc...:
- In some cases we may be able to use the `proxy` trick from the DataFrame
API: pass through an instance with 0-length batch dimension and see what we get
out. But I don't know if this will work universally.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]