Consumer of output port operator schema is going next downstream operator.
On Tue, Jan 31, 2017 at 4:01 AM, Sergey Golovko <ser...@datatorrent.com> wrote: > Sorry, I’m a new person in the APEX team. And I don't understand clearly > who are consumers of the output port operator schema(s). > > 1. If the consumers are non-run-time callers like the application manager > or UI designer, maybe it makes sense to use Java static method(s) to > retrieve the output port operator schema(s). I guess the performance of a > single call of a static method via reflection can be ignored. > > 2. If the consumer is next downstream operator, maybe it makes sense to > send an output port operator schema from upstream operator to next > downstream operator via the stream. The corresponded methods that would > send and receive the schema should be declared in the > interface/abstract-class of the upstream and downstream operators. The > sending/receiving of an output schema should be processed right before the > sending of the first data record via the stream. > > One of examples of a typical implementation for sending of metadata with a > regular result set is the sending of JDBC metadata as a part of JDBC result > set. And I hope the output schema (metadata of the streamed data) in the > implementation should contain not only a signature of the streamed objects > (like field names and data types), but also any other properties of the > data that can be useful by the schema receiver to process the data (for > instance, a delimiter for CSV record stream). > > Thanks, > Sergey > > On 2017-01-25 01:47 (-0800), Chinmay Kolhatkar <chin...@datatorrent.com> > wrote: > > Thank you all for the feedback. > > > > I've created a Jira for this: APEXCORE-623 and I'll attach the same > > document and link to this mailchain there. > > > > As a first part of this Jira, there are 2 steps I would like to propose: > > 1. Add following interface at com.datatorrent.common.util.SchemaAware. > > > > interface SchemaAware { > > > > Map<OutputPort, Schema> registerSchema(Map<InputPort, Schema> > inputSchema); > > } > > > > This interface can be implemented by Operators to communicate its output > > schema(s) to engine. > > Input to this schema will be schema at its input port. > > > > 2. After LogicalPlan is created call SchemaAware method from upstream to > > downstream operator in the DAG to propagate the Schema. > > > > Once this is done, changes can be done in Malhar for the operators in > > question. > > > > Please share your opinion on this approach. > > > > Thanks, > > Chinmay. > > > > > > > > > > On Wed, Jan 18, 2017 at 2:31 PM, Priyanka Gugale <pri...@apache.org> > wrote: > > > > > +1 to have this feature. > > > > > > -Priyanka > > > > > > On Tue, Jan 17, 2017 at 9:18 PM, Pramod Immaneni < > pra...@datatorrent.com> > > > wrote: > > > > > > > +1 > > > > > > > > On Mon, Jan 16, 2017 at 1:23 AM, Chinmay Kolhatkar < > chin...@apache.org> > > > > wrote: > > > > > > > > > Hi All, > > > > > > > > > > Currently a DAG that is generated by user, if contains any POJOfied > > > > > operators, TUPLE_CLASS attribute needs to be set on each and every > port > > > > > which receives or sends a POJO. > > > > > > > > > > For e.g., if a DAG is like File -> Parser -> Transform -> Dedup -> > > > > > Formatter -> Kafka, then TUPLE_CLASS attribute needs to be set by > user > > > on > > > > > both input and output ports of transform, dedup operators and also > on > > > > > parser output and formatter input. > > > > > > > > > > The proposal here is to reduce work that is required by user to > > > configure > > > > > the DAG. Technically speaking if an operators knows input schema > and > > > > > processing properties, it can determine output schema and convey > it to > > > > > downstream operators. This way the complete pipeline can be > configured > > > > > without user setting TUPLE_CLASS or even creating POJOs and adding > them > > > > to > > > > > classpath. > > > > > > > > > > On the same idea, I want to propose an approach where the pipeline > can > > > be > > > > > configured without user setting TUPLE_CLASS or even creating POJOs > and > > > > > adding them to classpath. > > > > > Here is the document which at a high level explains the idea and a > high > > > > > level design: > > > > > https://docs.google.com/document/d/1ibLQ1KYCLTeufG7dLoHyN_ > > > > > tRQXEM3LR-7o_S0z_porQ/edit?usp=sharing > > > > > > > > > > I would like to get opinion from community about feasibility and > > > > > applications of this proposal. > > > > > Once we get some consensus we can discuss the design in details. > > > > > > > > > > Thanks, > > > > > Chinmay. > > > > > > > > > > > > > > >