Hi All, Currently a DAG that is generated by user, if contains any POJOfied operators, TUPLE_CLASS attribute needs to be set on each and every port which receives or sends a POJO.
For e.g., if a DAG is like File -> Parser -> Transform -> Dedup -> Formatter -> Kafka, then TUPLE_CLASS attribute needs to be set by user on both input and output ports of transform, dedup operators and also on parser output and formatter input. The proposal here is to reduce work that is required by user to configure the DAG. Technically speaking if an operators knows input schema and processing properties, it can determine output schema and convey it to downstream operators. This way the complete pipeline can be configured without user setting TUPLE_CLASS or even creating POJOs and adding them to classpath. On the same idea, I want to propose an approach where the pipeline can be configured without user setting TUPLE_CLASS or even creating POJOs and adding them to classpath. Here is the document which at a high level explains the idea and a high level design: https://docs.google.com/document/d/1ibLQ1KYCLTeufG7dLoHyN_tRQXEM3LR-7o_S0z_porQ/edit?usp=sharing I would like to get opinion from community about feasibility and applications of this proposal. Once we get some consensus we can discuss the design in details. Thanks, Chinmay.