chenlica commented on code in PR #4558: URL: https://github.com/apache/texera/pull/4558#discussion_r3165738668
########## README.md: ########## @@ -33,136 +29,24 @@ <img alt="Static Badge" src="https://img.shields.io/badge/Largest_Deployment-100_nodes,_400_cores-green"> </p> -# Goals - -* Provide data science as cloud services; -* Provide a browser-based GUI to form a workflow without writing code; -* Allow non-IT people to access data science; -* Support collaborative data science; -* Allow users to interact with the execution of a job; -* Support huge volumes of data efficiently. - -# Workflow GUI -The Texera interface supports real-time collaboration on data science projects, allowing seamless sharing of data and workflows with easy access to AI/ML techniques and efficient management of public and private resources. -The workflow in the use case shown below includes data cleaning, ML model training, and validation. - - -# Publications (Computer Science) -* (5/2025) **Responsive Retrieval of Consistent States in Pipelined Executions of Dataflows** - Shengquan Ni, and Chen Li - _To appear in HILDA Workshop at SIGMOD 2025_ -* (11/2024) **IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow Systems** - Shengquan Ni, Yicong Huang, Zuozhi Wang, and Chen Li - _To appear in VLDB 2025_ -* (8/2024) **Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs** - Xiaozhen Liu, Yicong Huang, Xinyuan Lin, Avinash Kumar, Sadeem Alsudais, and Chen Li - _To appear in SIGMOD 2025_ -* (7/2024) **Texera: A System for Collaborative and Interactive Data Analytics Using Workflows** - Zuozhi Wang, Yicong Huang, Shengquan Ni, Avinash Kumar, Sadeem Alsudais, Xiaozhen Liu, Xinyuan Lin, Yunyan Ding, and Chen Li - _In VLDB 2024, Scalable Data Science track_ | [PDF](https://www.vldb.org/pvldb/vol17/p3580-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2024-texera-presentation.pdf) -* (3/2024) **Demonstration of Udon: Line-by-line Debugging of User-Defined Functions in Data Workflows** - Yicong Huang, Zuozhi Wang, and Chen Li - _In SIGMOD 2024 **Best Demo Runner-Up Award🏆**_ | [PDF](https://dl.acm.org/doi/10.1145/3626246.3654756) -* (2/2024) **Data Science Tasks Implemented with Scripts versus GUI-Based Workflows:** The Good, the Bad, and the Ugly - Alexander K Taylor, Yicong Huang, Junheng Hao, Xinyuan Lin, Xiusi Chen, Wei Wang, and Chen Li - _In DataPlat Workshop at ICDE 2024_ | [PDF](https://ieeexplore.ieee.org/abstract/document/10555112) | [Slides](https://chenli.ics.uci.edu/files/icde2024-dataplat-workshop.pdf) -<details> -<summary>Expand All</summary> - -* (8/2023) **Building a Collaborative Data Analytics System: Opportunities and Challenges** - Zuozhi Wang, Chen Li - _In Tutorial at VLDB 2023_ | [PDF](https://www.vldb.org/pvldb/vol16/p3898-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2023-texera-tutorial.pdf) -* (8/2023) **Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control** - Yicong Huang, Zuozhi Wang, and Chen Li - _In SIGMOD 2024_ | [PDF](https://dl.acm.org/doi/10.1145/3626712) | [Slides](https://chenli.ics.uci.edu/files/sigmod2024-udon-presentation.pdf) -* (8/2023) **Improving Iterative Analytics in GUI-Based Data-Processing Systems with Visualization, Version Control, and Result Reuse** - Sadeem Alsudais Ph.D. Thesis | [PDF](https://sadeemsaleh.github.io/Sadeem_phd_thesis.pdf) -* (7/2023) **Using Texera to Characterize Climate Change Discussions on Twitter During Wildfires** - Shengquan Ni, Yicong Huang, Jessie W. Y. Ko, Alexander Taylor, Xiusi Chen, Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, Suellen Hopfer, and Chen Li - _In Data Science Day at KDD 2023_ -* (7/2023) **Raven: Accelerating Execution of Iterative Data Analytics by Reusing Results of Previous Equivalent Versions** - Sadeem Alsudais, Avinash Kumar, and Chen Li - _In HILDA Workshop at SIGMOD 2023_ | [PDF](https://dl.acm.org/doi/10.1145/3597465.3605219) -* (6/2023) **Texera: A System for Collaborative and Interactive Data Analytics Using Workflows** - Zuozhi Wang Ph.D. Thesis | [PDF](https://zuozhiw.github.io/Zuozhi_Wang_UCI_PhD_Thesis.pdf) -* (12/2022) **Towards Interactive, Adaptive and Result-aware Big Data Analytics** - Avinash Kumar Ph.D. Thesis | [PDF](https://arxiv.org/abs/2212.07096) -* (9/2022) **Fries: Fast and Consistent Runtime Reconfiguration in Dataflow Systems with Transactional Guarantees** - Zuozhi Wang, Shengquan Ni, Avinash Kumar, and Chen Li - _In VLDB 2023_ | [PDF](https://www.vldb.org/pvldb/vol16/p256-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2023-fries.pdf) -* (7/2022) **Drove: Tracking Execution Results of Workflows on Large Datasets** - Sadeem Alsudais - _In the Ph.D. Workshop at VLDB 2022_ | [PDF](http://ceur-ws.org/Vol-3186/paper_10.pdf) -* (6/2022) **Demonstration of Accelerating Machine Learning Inference Queries with Correlative Proxy Models** - Zhihui Yang, Yicong Huang, Zuozhi Wang, Feng Gao, Yao Lu, Chen Li, and X. Sean Wang - _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p3734-yang.pdf) -* (6/2022) **Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in Texera** - Xiaozhen Liu, Zuozhi Wang, Shengquan Ni, Sadeem Alsudais, Yicong Huang, Avinash Kumar, and Chen Li - _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p3738-liu.pdf) | [Demo Video](https://youtu.be/2gfPUZNsoBs) -* (4/2022) **Optimizing Machine Learning Inference Queries with Correlative Proxy Models** - Zhihui Yang, Zuozhi Wang, Yicong Huang, Yao Lu, Chen Li, and X. Sean Wang - _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p2032-yang.pdf) -* (7/2020) **Demonstration of Interactive Runtime Debugging of Distributed Dataflows in Texera** - Zuozhi Wang, Avinash Kumar, Shengquan Ni, and Chen Li - _In VLDB 2020_ | [PDF](http://www.vldb.org/pvldb/vol13/p2953-wang.pdf) | [Video](https://www.youtube.com/watch?v=SP-XiDADbw0) | [Slides](https://docs.google.com/presentation/d/14U6RPZfeb8Ho0aO2HsCSc8lRs6ul6AxEIm5gpjeVUYA/edit?usp=sharing) -* (1/2020) **Amber: A Debuggable Dataflow system based on the Actor Model** - Avinash Kumar, Zuozhi Wang, Shengquan Ni, and Chen Li - _In VLDB 2020_ | [PDF](http://www.vldb.org/pvldb/vol13/p740-kumar.pdf) | [Video](https://www.youtube.com/watch?v=T5ShFRfHmgI) | [Slides](https://docs.google.com/presentation/d/1v8G9lDmfv4Ff2YWyrGfo_9iMQVF4N8a-4gO4H-K6rCk/edit?usp=sharing) -* (4/2017) **A Demonstration of TextDB: Declarative and Scalable Text Analytics on Large Data Sets** - Zuozhi Wang, Flavio Bayer, Seungjin Lee, Kishore Narendran, Xuxi Pan, Qing Tang, Jimmy Wang, and Chen Li - _In ICDE 2017_ **Best Demo award** | [PDF](https://chenli.ics.uci.edu/files/icde2017-textdb-demo.pdf) | [Video](https://github.com/Texera/texera/wiki/Video) - -</details> +Apache Texera (Incubating) is an open-source system for human-AI collaborative data science using visual workflows. It enables analysts to construct, execute, and refine data analysis tasks through an intuitive GUI, assisted by AI agents that understand natural-language instructions. Texera is well suited for a wide range of applications, including “AI for Science,” by making advanced AI and data science capabilities accessible to a broader community. It can run on a laptop for local use or be deployed in the cloud to support scalable processing of large datasets. -# Publications (Interdisciplinary): -* (2/2025) **DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a Service** - Jiadong Bai, Xiaozhen Liu, Anthony Cuturrufo, Alexander Kundu Taylor, Jeehyun Hwang, Mingyu Derek Ma, Xinyuan Lin, Yanqiao Zhu, Yicong Huang, Yunyan Ding, Wei Wang, and Chen Li - _To appear in [Data Science Education K-12: Research to Practice Annual Conference 2025](https://web.cvent.com/event/d641bd9f-6c99-4cbc-951b-33b1ca05d4ed/summary)_ -* (7/2024) **Brain Image Data Processing Using Collaborative Data Workflows on Texera** - Yunyan Ding, Yicong Huang, Pan Gao, Andy Thai, Atchuth Naveen Chilaparasetti, M. Gopi, Xiangmin Xu, and Chen Li - _In Frontiers Neural Circuits_ | [PDF](https://doi.org/10.3389/fncir.2024.1398884) -* (1/2024) **Wording Matters: The Effect of Linguistic Characteristics and Political Ideology on Resharing of COVID-19 Vaccine Tweets** - Judith Borghouts, Yicong Huang, Suellen Hopfer, Chen Li, and Gloria Mark - _In TOCHI 2024_ | [PDF](https://dl.acm.org/doi/pdf/10.1145/3637876) -* (1/2024) **How the Experience of California Wildfires Shape Twitter Climate Change Framings** - Jessie W. Y. Ko, Shengquan Ni, Alexander Taylor, Xiusi Chen, Yicong Huang, Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, Chen Li, and Suellen Hopfer - _In Climatic Change 2024_ | [PDF](https://link.springer.com/content/pdf/10.1007/s10584-023-03668-0.pdf) -* (11/2023) **The Marketing and Perceptions of Non-Tobacco Blunt Wraps on Twitter** - Joshua U. Rhee, Yicong Huang, Aurash J. Soroosh, Sadeem Alsudais, Shengquan Ni, Avinash Kumar, Jacob Paredes, Chen Li, and David S. Timberlake - _In Substance Use & Misuse 2023_ | [PDF](https://www.tandfonline.com/doi/epdf/10.1080/10826084.2023.2280572?needAccess=true) +The system has the following key features: -<details> -<summary>Expand All</summary> +* Natural-language data science through AI chatbots +* Intuitive GUI-based workflows for data analysis +* Parallel backend engine for scalable big-data processing +* Real-time collaboration for workflow editing and execution +* User-defined functions in Python and Java +* Separation of compute and storage for flexible cloud deployment +* Runtime debugging and interactive workflow execution +* Cloud-native deployment support +* Multi-tenant support with workload isolation +* Extensible architecture for integrating external web services Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
