[
https://issues.apache.org/jira/browse/BEAM-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445449#comment-17445449
]
Brian Hulette commented on BEAM-13150:
--------------------------------------
Absolutely agree it would be great to have a good path for this.
Thinking about it more, I'm not sure it would be a good idea to make this
change in Apache Beam, as it would introduce a circular dependency between TFX
and Beam. I think we should either:
- Try to implement this completely in TFX (e.g. can there be a PTransform that
produces a schema'd PCollection by reading the TF schema and generating an
appropriate type?), or
- Add some generalizable support for defining schemas from arbitrary types in
Beam (BEAM-8732), and then leverage that from TFX.
That being said, at this early stage it makes a lot of sense to hack on this in
Beam, so we understand the problem and what general infrastructure we need.
> Integrate TFRecord/tf.train.Example with Beam Schemas and the DataFrame API
> ---------------------------------------------------------------------------
>
> Key: BEAM-13150
> URL: https://issues.apache.org/jira/browse/BEAM-13150
> Project: Beam
> Issue Type: Improvement
> Components: dsl-dataframe, sdk-py-core
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Priority: P2
>
> See discussion in BEAM-12955
--
This message was sent by Atlassian Jira
(v8.20.1#820001)