zyratlo opened a new pull request, #3819: URL: https://github.com/apache/texera/pull/3819
**NOTE:** this tool is still in development, design choices and features currently present are not finalized # PR Description This PR reintroduces the migration tool branch to the Texera repository after it was removed during our transition to an Apache project. The code changes included in this PR are purely front-end GUI changes, as the back-end is currently a standalone micro-service separate from the Texera codebase. ## Purpose Currently, users who have existing code outside of Texera and want to migrate that code to Texera must create a workflow from scratch. This can take a long time to do depending on the complexity of the code. This tool aims to reduce the amount of time needed migrating to Texera by utilizing large language models to migrate Jupyter Notebooks to Texera workflows. ## Tool Overview (Demo Videos Below) The user can upload a Jupyter Notebook which will be given to the OpenAI LLM API to migrate into a Texera workflow. Once generated, the user can modify the workflow alongside the original notebook until they are satisfied with the migration results. ## Design <img width="2187" height="1314" alt="image" src="https://github.com/user-attachments/assets/f1793ce7-9eb1-433b-9a0a-169274511e8a" /> The uploaded notebook is passed through the front-end to the migration micro-service in the back-end. The micro-service will handle all communication with OpenAI. OpenAI returns the generated workflow to the micro-service, which passes it to the front-end to render. The communication design with OpenAI is shown below: <img width="2590" height="973" alt="image" src="https://github.com/user-attachments/assets/93c6aff9-d07a-4c90-8635-40f7c5df01dd" /> ## Future Work - The main concern is the reliability and accuracy of the returned workflow from the LLM. The current effort is to research methods to improve this concern, such as relying more on algorithmic methods instead of black-box LLM results and reducing the dependency on OpenAI. - Another effort is to integrate the separate micro-service into the Texera back-end. # Demo **1.** User starts with a Jupyter Notebook they want to migrate into Texera. https://github.com/user-attachments/assets/88549d9c-92b0-42ce-ba25-5cafceb99daa **2.** User uploads the Jupyter Notebook using the new tool button. https://github.com/user-attachments/assets/ca3621f3-a44a-464a-8996-58edafe94137 **3.** User can view the uploaded notebook from within Texera. https://github.com/user-attachments/assets/79f07687-1b88-49b1-8cba-74d7dfa199c1 **4.** Depending on the notebook size and complexity, generation can take between one to three minutes. After the workflow is generated, the user can begin editing. https://github.com/user-attachments/assets/291b4164-b750-49cf-b37c-2d4bcbba87fb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
