I agree with the current implementation plan of global parameters, this is a great job.
Thanks, Wenjun Ruan wenjun ruan <[email protected]> 于2021年6月6日周日 上午1:30写道: > Hi, > > Xingjie's plan is great, I don't oppose the current plan. > > But there seems to be something in the current plan that may be confusing. > > 1. In the current plan, we seem to store both the in param and out param > in varpool. When the in param and the out param have the same name, the out > param will override the in param. This may be reasonable, but when we are > troubleshooting, it's hard to know where is current in param from, it may > come from the upstream or generate by the current task. > > 2. If there are multiple upstream tasks want to transport the same > parameter, only one will be kept in the varpool, and it's hard to tell > which one will be kept. > > 3. If we store out param in all varpool, this may cause the large varpool > in post node. > > You can get detail at > https://github.com/apache/dolphinscheduler/issues/5565 > > To be honest, I am not sure if the user will have these problems. > > And My suggestion is not to store all the param at varpool, each task > saves only its own out parameter in the varpool. I am not sure if this is > consistent with @Xingjie's second plan. > > > This is my personal opinion, for reference only. > > Thanks, > Wenjun Ruan > > > Lidong Dai <[email protected]> 于2021年6月5日周六 下午11:22写道: > >> hi, >> any progress? Do we need a meeting to solve this? >> By the way, the picture can't show in the apache mailing list, you can >> upload the pic to github, then paste the url address to the mail. >> >> >> Best Regards >> --------------- >> DolphinScheduler PMC >> Lidong Dai >> [email protected] >> --------------- >> >> >> On Tue, Jun 1, 2021 at 5:04 PM Xingjie Wang(联通集团联通数字科技有限公司) < >> [email protected]> wrote: >> >> > This really doesn't satisfy those scene. >> > For the first one ,If we want to do this, we should save the varPool to >> > the level of processInstance, so the Task4 can get the varPool from >> Task1 >> > and do not by Task2 and Task3. >> > This one will obscure the relation between globalParam and localParam >> and >> > varPool. >> > Other plan ,when user define the Task4 IN param ,user should chonse the >> > Task1 that this param is the Task1's OUT param .When init the Task4's >> > varPool ,get the Task1 form completeTaskList,then get the varPool.And >> when >> > the taskInstance have a new property, mark this taskInstance'name into >> this >> > property ,and put this property into the varPool. >> > It will satisfy those scene. >> > How do you think? >> > -----邮件原件----- >> > 发件人: Ruan, Wenjun <[email protected]> >> > 发送时间: 2021年6月1日 16:07 >> > 收件人: [email protected] >> > 主题: Re: [DISCUSS]The new Plan of global params >> > >> > Sorry, it seems that the picture cannot be displayed well, the dag >> > structure is as follow: >> > >> > Task1 -> Task2 -> Task3 -> Task4 >> > >> > From: Ruan, Wenjun <[email protected]> >> > Date: Tuesday, June 1, 2021 at 3:55 PM >> > To: [email protected] <[email protected]> >> > Subject: Re: [DISCUSS]The new Plan of global params External Email Hi >> > Xingjie, >> > >> > I have two things want to confirm. >> > In your plan it seems that we need to store varpool in all the post >> nodes? >> > For example, if I have a simple dag like below: >> > [cid:[email protected]] >> > >> > If I want to get the out param of Task1 in Task4, for example we call it >> > parameterA, then we need to store parameterA in Task2’s varpool and >> Task3’s >> > varpool, even if we don’t need parameterA in Task2 and Task3, am I >> right? >> > Can we take it directly from Task1? >> > >> > The second thing is that if I want to troubleshouting, can I find the >> > source of the parameters in the varpool? It seems I need to look forward >> > and find the first node that contains the parameter? It seems not >> > convenient, if I have much tasks in a dag. >> > >> > Thanks, >> > Wenjun Ruan >> > >> > >> > From: Xingjie Wang(联通集团联通数字科技有限公司) <[email protected]> >> > Date: Tuesday, June 1, 2021 at 3:21 PM >> > To: [email protected] <[email protected]> >> > Subject: [DISCUSS]The new Plan of global params External Email >> > >> > >> > >> > Hi Dev Team >> > >> > The scheme of The global params that Task need change as blow。 >> > Here is the ISSUE about this DISCUSS。 >> > >> > >> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fdolphinscheduler%2Fissues%2F5565&data=04%7C01%7Cweruan%40ebay.com%7C71c6b15115994fd6463808d924cdbf7b%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637581288947116284%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Na6C8tAQODl%2BrfThuaGmtCcVqqwt1eSBpQTFgaRuVXc%3D&reserved=0 >> > < >> > >> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fdolphinscheduler%2Fissues%2F5565&data=04%7C01%7Cweruan%40ebay.com%7C2a73397f8b394100b21508d924d28b37%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637581309371279320%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=1B%2BKIBz%2BCfhTp0nRiK3oKFz4KKgYQDN%2B46xF091o0BQ%3D&reserved=0 >> > > >> > >> > I will change the scheme of global param that depend on the relation of >> > the task. >> > Here is the detail about this . >> > >> > 1. create taskInstance >> > >> > Get the previous tasks ,get the varPool of those tasks, put those into >> > this varPool. If the previous task has the same varPool name,use the >> value >> > that is not null.If all of the values are null, use the earlier one. >> > >> > 1. Worker get the param >> > >> > Master will send the varPool to the Worker ,and taskPorcessor get the >> > varPool with the format of List<Property>, varPool will do the same as >> the >> > localParam. >> > >> > 1. Worker response out Parm >> > >> > When the user define the OUT param in the page of Task Definition.Worker >> > will get the result of Processor . >> > >> > The different Processor return the different format ,for example SQL >> > return the format of List<Map<String,String>>,users could get more than >> one >> > line or more than one column;SHELL get the Map<String,String> or >> > String.This out Param will add into the varPool ,and send to the Master >> > ,save into the databases.Also ,this value will save into the localParam >> > >> > >> > >> > If you have some question or have the better plan ,please contact me >> > .thank you. >> > >> > 如果您错误接收了该邮件,请通过电子邮件立即通知我们。请回复邮件到 [email protected] >> ,即可以退订此邮件。我们将立即将您的信息从我们的发送目录中删除。 >> > If you have received this email in error please notify us immediately by >> > e-mail. Please reply to [email protected] ,you can unsubscribe >> from >> > this mail. We will immediately remove your information from send >> catalogue >> > of our. >> > 如果您错误接收了该邮件,请通过电子邮件立即通知我们。请回复邮件到 [email protected] >> ,即可以退订此邮件。我们将立即将您的信息从我们的发送目录中删除。 >> > If you have received this email in error please notify us immediately by >> > e-mail. Please reply to [email protected] ,you can unsubscribe >> from >> > this mail. We will immediately remove your information from send >> catalogue >> > of our. >> > >> >
