Yicong-Huang commented on PR #5179: URL: https://github.com/apache/texera/pull/5179#issuecomment-4529506515
Thanks for the PR! It's great to see multiple designs of macros. We definitely need this feature! Let's talk about the experience before diving into code. ### create a macro It is not shown or clear to me how to create a macro from scratch? In @Xiao-zhen-Liu 's version (#5115), user highlights three operators and right click to show the context menu, then click "create Marco", then the three operators disappear and replaced with a macro operator. The Marco operator is same shape/size of other normal operators. I do like this experience. ### Representation of a macro and reuse a macro operator In this version, user drags a macro operator (which is a marker, instead of an operator), and selects options from a drop down menu inside property panel. It is not clear how to reuse a marco in #5115. I would prefer not to have Marco as a marker. instead, we have always been discussing that a Marco operator should just be the same as other logical operators. For instance, you can view `Hash Join` (we should rename it to Join) operator as a macro. It gets expanded into build and probe operators later after compile. I will be great if Marco operators can just look similar to other normal operators (like inside #5115). Most importantly, Marco operators should have ports, that is derived from the sub-dag that it is representing. For instance, if a macro operator contains a sub-dag that has 2 inputs, and 3 output branches, then it will be good for the Marco to have 2 input ports and 3 output ports. This way user can use the macro operator without knowing its internal shape. During execution, user can see the macro's aggregated statistics instead of internal operators' statistics. ### Display the macro internal sub-Dag and edit it In this version, the macro's content (internal sub-DAG) is displayed inplace. It is more like the grouping view Zuozhi implemented before in #754. And user can drag and drop a new operator into existing macro to edit it. I assume user can also delete one operator or reconnect an edge between operators. In Xiaozhen's #5115, user needs to click on the macro to jump to another workflow view, where it expands into the internal dag. The internal view has a dummy source operator and a dummy sink operator for user to understand. I think this experience is also mimicking rapidminer/knime. It is not clear if user can edit the internal sub-DAG in that view. I do like the experience to jump to another workflow view to edit Marco's internal shape, because usually once a macro is created, it will be likely to be reused by many workflows. To edit a macro in place of a particular workflow would also affect the same macro being used in other workflows (think of modifying a function's definition at its callsite, other invocations places of the same function would also be affected). The marco is better to be edited in a stand alone view. However I do not like the dummy source and dummy sink in #5115. I think we should prioritize ports and just show input and output ports are sufficient. ### Other ideas - I think it maybe good idea to show "My macros" in the operator properties directly and let users drag a Marco operator similarly as dragging a normal operator out. This can save the changes on the property panel. Think about this: how many times would user select a marco (marker) and uses the drop down to change it to another macro implementation? I think if in any cases user need to use another macro, they should just drag it out from the operator menu. Ok those are all my high level comments regarding this PR. I am sure we will have much more detailed comments going deeper. So let's talk about breaking into smaller PRs. I still think this PR is too large: it has too many design choices that include creating a Marco, reuse a previously created macro, representation of a macro, editing a Marco, runtime behavior of the Marco. There are other life cycles which are not clear or being discussed: removing of a Marco, disable a macro, backend compilation of macro, persistence of macro (how to represent it in DB), search of a Marco (indexing) .... There are also other things we need to change, such as documentation, tests, tutorials of how to use Marco, etc. The right size of a PR, is we can discuss and review for a single or two design choices. Making the scope smaller can make the discussion easier. For instance, I would recommend you have one PR about creating a macro (without saving it or reuse it), then we can discussion about the experience of creating a macro in the PR. Then you can have a PR to persist the Marco into db, we can discuss where to store macro information and how to make sure search can index it. Then you can have a PR to introduce experience of how user can reuse a previously stored macro in a new workflow. Then another PR to edit a previously macro. For each feature, you can split into frontend, backend to make size even smaller. I think you can see my point now. Making PRs smaller is not the goal: it is because smaller PRs can get proper reviews and discussions on your changes. You may worry about that: these are partial features, can we merge into main? yes we can, just give a config that macro feature is turned off by default in main, so it doesn't affect any other features. After we have more PRs to implement the whole life cycle, fix bugs, add documentations, we then can turn on the feature by default to let user see and use it. I hope this make sense to you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
