Apache Beam YAML makes heavy use of schemas to both provide high-level, semantically meaningful transforms and to more painlessly facilitate mixing and matching transforms across language boundaries. This works well where we are able to infer the schemas, but requires painful manual declarations where we are not (PubSub inputs being a prime example). There are also some cases where we do not care about the full structure of the input data (e.g. we are augmenting or filtering based on a few fields) or even care about it at all (e.g. the downstream can consume dynamically-schema'd data, like BigQuery write).
These usecases are not handled well in the current system, but have proven to be important for many Beam users (e.g. as attested to by Dataflow templates usage). We would like to be able to easily and naturally support such usecases in Beam YAML as well. Note that Unknown Schema'd data is different (and possibly more flexible) than fully Unschema'd data (such as arbitrary Python or Java objects). I've written up a doc exploring this and some possible solutions at https://s.apache.org/beam-yaml-unknown-schema and would welcome any feedback or ideas people have on the idea. - Robert