Hi,

Xingjie's plan is great, I don't oppose the current plan.

But there seems to be something in the current plan that may be confusing.

1. In the current plan, we seem to store both the in param and out param in
varpool. When the in param and the out param have the same name, the out
param will override the in param. This may be reasonable, but when we are
troubleshooting, it's hard to know where is current in param from, it may
come from the upstream or generate by the current task.

2. If there are multiple upstream tasks want to transport the same
parameter, only one will be kept in the varpool, and it's hard to tell
which one will be kept.

3. If we store out param in all varpool, this may cause the large varpool
in post node.

You can get detail at https://github.com/apache/dolphinscheduler/issues/5565

To be honest, I am not sure if the user will have these problems.

And My suggestion is not to store all the param at varpool, each task saves
only its own out parameter in the varpool. I am not sure if this is
consistent with @Xingjie's second plan.


This is my personal opinion, for reference only.

Thanks,
Wenjun Ruan


Lidong Dai <[email protected]> 于2021年6月5日周六 下午11:22写道:

> hi,
>   any progress? Do we need a meeting to solve this?
> By the way, the picture can't show in the apache mailing list, you can
> upload the pic to github, then paste the url address to the mail.
>
>
> Best Regards
> ---------------
> DolphinScheduler PMC
> Lidong Dai
> [email protected]
> ---------------
>
>
> On Tue, Jun 1, 2021 at 5:04 PM Xingjie Wang(联通集团联通数字科技有限公司) <
> [email protected]> wrote:
>
> > This really doesn't satisfy those scene.
> > For the first one ,If we want to do this, we should save the varPool to
> > the level of processInstance, so the Task4 can get the varPool from Task1
> > and do not by Task2 and Task3.
> > This one will obscure the relation between globalParam and localParam and
> > varPool.
> > Other plan ,when user define the Task4 IN param ,user should chonse the
> > Task1 that this param is the Task1's OUT param .When init the Task4's
> > varPool ,get the Task1 form completeTaskList,then get the varPool.And
> when
> > the taskInstance have a new property, mark this taskInstance'name into
> this
> > property ,and put this property into the varPool.
> > It will satisfy those scene.
> > How do you think?
> > -----邮件原件-----
> > 发件人: Ruan, Wenjun <[email protected]>
> > 发送时间: 2021年6月1日 16:07
> > 收件人: [email protected]
> > 主题: Re: [DISCUSS]The new Plan of global params
> >
> > Sorry, it seems that the picture cannot be displayed well, the dag
> > structure is as follow:
> >
> > Task1   ->  Task2   ->  Task3   ->  Task4
> >
> > From: Ruan, Wenjun <[email protected]>
> > Date: Tuesday, June 1, 2021 at 3:55 PM
> > To: [email protected] <[email protected]>
> > Subject: Re: [DISCUSS]The new Plan of global params External Email Hi
> > Xingjie,
> >
> > I have two things want to confirm.
> > In your plan it seems that we need to store varpool in all the post
> nodes?
> > For example, if I have a simple dag like below:
> > [cid:[email protected]]
> >
> > If I want to get the out param of Task1 in Task4, for example we call it
> > parameterA, then we need to store parameterA in Task2’s varpool and
> Task3’s
> > varpool, even if we don’t need parameterA in Task2 and Task3, am I right?
> > Can we take it directly from Task1?
> >
> > The second thing is that if I want to troubleshouting, can I find the
> > source of the parameters in the varpool? It seems I need to look forward
> > and find the first node that contains the parameter? It seems not
> > convenient, if I have much tasks in a dag.
> >
> > Thanks,
> > Wenjun Ruan
> >
> >
> > From: Xingjie Wang(联通集团联通数字科技有限公司) <[email protected]>
> > Date: Tuesday, June 1, 2021 at 3:21 PM
> > To: [email protected] <[email protected]>
> > Subject: [DISCUSS]The new Plan of global params External Email
> >
> >
> >
> > Hi Dev Team
> >
> > The scheme of The global params that Task need change as blow。
> > Here is the ISSUE about this DISCUSS。
> >
> >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fdolphinscheduler%2Fissues%2F5565&amp;data=04%7C01%7Cweruan%40ebay.com%7C71c6b15115994fd6463808d924cdbf7b%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637581288947116284%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=Na6C8tAQODl%2BrfThuaGmtCcVqqwt1eSBpQTFgaRuVXc%3D&amp;reserved=0
> > <
> >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fdolphinscheduler%2Fissues%2F5565&data=04%7C01%7Cweruan%40ebay.com%7C2a73397f8b394100b21508d924d28b37%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637581309371279320%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=1B%2BKIBz%2BCfhTp0nRiK3oKFz4KKgYQDN%2B46xF091o0BQ%3D&reserved=0
> > >
> >
> > I will change the scheme of global param that depend on the relation of
> > the task.
> > Here is the detail about this .
> >
> > 1.     create taskInstance
> >
> > Get the previous tasks ,get the varPool of those tasks, put those into
> > this varPool. If the previous task has the same varPool name,use the
> value
> > that is not null.If all of the  values  are null, use the earlier one.
> >
> >   1.  Worker get the param
> >
> > Master will send the varPool to the Worker ,and taskPorcessor get the
> > varPool with the format of List<Property>, varPool will do the same as
> the
> > localParam.
> >
> >   1.  Worker response out Parm
> >
> > When the user define the OUT param in the page of Task Definition.Worker
> > will get the result of Processor .
> >
> > The different Processor return the different format ,for example SQL
> > return the format of List<Map<String,String>>,users could get more than
> one
> > line or more than one column;SHELL get the Map<String,String> or
> > String.This out Param will add into the varPool ,and send to the Master
> > ,save into the databases.Also ,this value will save into the localParam
> >
> >
> >
> > If you have some question or have the better plan ,please contact me
> > .thank you.
> >
> > 如果您错误接收了该邮件,请通过电子邮件立即通知我们。请回复邮件到 [email protected]
> ,即可以退订此邮件。我们将立即将您的信息从我们的发送目录中删除。
> > If you have received this email in error please notify us immediately by
> > e-mail. Please reply to [email protected] ,you can unsubscribe
> from
> > this mail. We will immediately remove your information from send
> catalogue
> > of our.
> > 如果您错误接收了该邮件,请通过电子邮件立即通知我们。请回复邮件到 [email protected]
> ,即可以退订此邮件。我们将立即将您的信息从我们的发送目录中删除。
> > If you have received this email in error please notify us immediately by
> > e-mail. Please reply to [email protected] ,you can unsubscribe
> from
> > this mail. We will immediately remove your information from send
> catalogue
> > of our.
> >
>

Reply via email to