Re: Schema Discovery Support in Apex Applications

AJAY GUPTA Mon, 16 Jan 2017 02:50:49 -0800

+1 for the idea.

I just had one question.


As I understand, there will be some form of Anonymous POJO used as objects
to pass information from one operator to another. Can you share how the
user/operator developer would access the tuple object in case he wishes to
do something with it?


Ajay

On Mon, Jan 16, 2017 at 2:53 PM, Chinmay Kolhatkar <chin...@apache.org>
wrote:

> Hi All,
>
> Currently a DAG that is generated by user, if contains any POJOfied
> operators, TUPLE_CLASS attribute needs to be set on each and every port
> which receives or sends a POJO.
>
> For e.g., if a DAG is like File -> Parser -> Transform -> Dedup ->
> Formatter -> Kafka, then TUPLE_CLASS attribute needs to be set by user on
> both input and output ports of transform, dedup operators and also on
> parser output and formatter input.
>
> The proposal here is to reduce work that is required by user to configure
> the DAG. Technically speaking if an operators knows input schema and
> processing properties, it can determine output schema and convey it to
> downstream operators. This way the complete pipeline can be configured
> without user setting TUPLE_CLASS or even creating POJOs and adding them to
> classpath.
>
> On the same idea, I want to propose an approach where the pipeline can be
> configured without user setting TUPLE_CLASS or even creating POJOs and
> adding them to classpath.
> Here is the document which at a high level explains the idea and a high
> level design:
> https://docs.google.com/document/d/1ibLQ1KYCLTeufG7dLoHyN_
> tRQXEM3LR-7o_S0z_porQ/edit?usp=sharing
>
> I would like to get opinion from community about feasibility and
> applications of this proposal.
> Once we get some consensus we can discuss the design in details.
>
> Thanks,
> Chinmay.
>

Re: Schema Discovery Support in Apex Applications

Reply via email to