+1 for the idea. I just had one question.
As I understand, there will be some form of Anonymous POJO used as objects to pass information from one operator to another. Can you share how the user/operator developer would access the tuple object in case he wishes to do something with it? Ajay On Mon, Jan 16, 2017 at 2:53 PM, Chinmay Kolhatkar <chin...@apache.org> wrote: > Hi All, > > Currently a DAG that is generated by user, if contains any POJOfied > operators, TUPLE_CLASS attribute needs to be set on each and every port > which receives or sends a POJO. > > For e.g., if a DAG is like File -> Parser -> Transform -> Dedup -> > Formatter -> Kafka, then TUPLE_CLASS attribute needs to be set by user on > both input and output ports of transform, dedup operators and also on > parser output and formatter input. > > The proposal here is to reduce work that is required by user to configure > the DAG. Technically speaking if an operators knows input schema and > processing properties, it can determine output schema and convey it to > downstream operators. This way the complete pipeline can be configured > without user setting TUPLE_CLASS or even creating POJOs and adding them to > classpath. > > On the same idea, I want to propose an approach where the pipeline can be > configured without user setting TUPLE_CLASS or even creating POJOs and > adding them to classpath. > Here is the document which at a high level explains the idea and a high > level design: > https://docs.google.com/document/d/1ibLQ1KYCLTeufG7dLoHyN_ > tRQXEM3LR-7o_S0z_porQ/edit?usp=sharing > > I would like to get opinion from community about feasibility and > applications of this proposal. > Once we get some consensus we can discuss the design in details. > > Thanks, > Chinmay. >