chenlica commented on code in PR #4558: URL: https://github.com/apache/texera/pull/4558#discussion_r3165736434
########## README.md: ########## @@ -33,136 +29,24 @@ <img alt="Static Badge" src="https://img.shields.io/badge/Largest_Deployment-100_nodes,_400_cores-green"> </p> -# Goals - -* Provide data science as cloud services; -* Provide a browser-based GUI to form a workflow without writing code; -* Allow non-IT people to access data science; -* Support collaborative data science; -* Allow users to interact with the execution of a job; -* Support huge volumes of data efficiently. - -# Workflow GUI -The Texera interface supports real-time collaboration on data science projects, allowing seamless sharing of data and workflows with easy access to AI/ML techniques and efficient management of public and private resources. -The workflow in the use case shown below includes data cleaning, ML model training, and validation. - - -# Publications (Computer Science) -* (5/2025) **Responsive Retrieval of Consistent States in Pipelined Executions of Dataflows** - Shengquan Ni, and Chen Li - _To appear in HILDA Workshop at SIGMOD 2025_ -* (11/2024) **IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow Systems** - Shengquan Ni, Yicong Huang, Zuozhi Wang, and Chen Li - _To appear in VLDB 2025_ -* (8/2024) **Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs** - Xiaozhen Liu, Yicong Huang, Xinyuan Lin, Avinash Kumar, Sadeem Alsudais, and Chen Li - _To appear in SIGMOD 2025_ -* (7/2024) **Texera: A System for Collaborative and Interactive Data Analytics Using Workflows** - Zuozhi Wang, Yicong Huang, Shengquan Ni, Avinash Kumar, Sadeem Alsudais, Xiaozhen Liu, Xinyuan Lin, Yunyan Ding, and Chen Li - _In VLDB 2024, Scalable Data Science track_ | [PDF](https://www.vldb.org/pvldb/vol17/p3580-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2024-texera-presentation.pdf) -* (3/2024) **Demonstration of Udon: Line-by-line Debugging of User-Defined Functions in Data Workflows** - Yicong Huang, Zuozhi Wang, and Chen Li - _In SIGMOD 2024 **Best Demo Runner-Up Award🏆**_ | [PDF](https://dl.acm.org/doi/10.1145/3626246.3654756) -* (2/2024) **Data Science Tasks Implemented with Scripts versus GUI-Based Workflows:** The Good, the Bad, and the Ugly - Alexander K Taylor, Yicong Huang, Junheng Hao, Xinyuan Lin, Xiusi Chen, Wei Wang, and Chen Li - _In DataPlat Workshop at ICDE 2024_ | [PDF](https://ieeexplore.ieee.org/abstract/document/10555112) | [Slides](https://chenli.ics.uci.edu/files/icde2024-dataplat-workshop.pdf) -<details> -<summary>Expand All</summary> - -* (8/2023) **Building a Collaborative Data Analytics System: Opportunities and Challenges** - Zuozhi Wang, Chen Li - _In Tutorial at VLDB 2023_ | [PDF](https://www.vldb.org/pvldb/vol16/p3898-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2023-texera-tutorial.pdf) -* (8/2023) **Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control** - Yicong Huang, Zuozhi Wang, and Chen Li - _In SIGMOD 2024_ | [PDF](https://dl.acm.org/doi/10.1145/3626712) | [Slides](https://chenli.ics.uci.edu/files/sigmod2024-udon-presentation.pdf) -* (8/2023) **Improving Iterative Analytics in GUI-Based Data-Processing Systems with Visualization, Version Control, and Result Reuse** - Sadeem Alsudais Ph.D. Thesis | [PDF](https://sadeemsaleh.github.io/Sadeem_phd_thesis.pdf) -* (7/2023) **Using Texera to Characterize Climate Change Discussions on Twitter During Wildfires** - Shengquan Ni, Yicong Huang, Jessie W. Y. Ko, Alexander Taylor, Xiusi Chen, Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, Suellen Hopfer, and Chen Li - _In Data Science Day at KDD 2023_ -* (7/2023) **Raven: Accelerating Execution of Iterative Data Analytics by Reusing Results of Previous Equivalent Versions** - Sadeem Alsudais, Avinash Kumar, and Chen Li - _In HILDA Workshop at SIGMOD 2023_ | [PDF](https://dl.acm.org/doi/10.1145/3597465.3605219) -* (6/2023) **Texera: A System for Collaborative and Interactive Data Analytics Using Workflows** - Zuozhi Wang Ph.D. Thesis | [PDF](https://zuozhiw.github.io/Zuozhi_Wang_UCI_PhD_Thesis.pdf) -* (12/2022) **Towards Interactive, Adaptive and Result-aware Big Data Analytics** - Avinash Kumar Ph.D. Thesis | [PDF](https://arxiv.org/abs/2212.07096) -* (9/2022) **Fries: Fast and Consistent Runtime Reconfiguration in Dataflow Systems with Transactional Guarantees** - Zuozhi Wang, Shengquan Ni, Avinash Kumar, and Chen Li - _In VLDB 2023_ | [PDF](https://www.vldb.org/pvldb/vol16/p256-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2023-fries.pdf) -* (7/2022) **Drove: Tracking Execution Results of Workflows on Large Datasets** - Sadeem Alsudais - _In the Ph.D. Workshop at VLDB 2022_ | [PDF](http://ceur-ws.org/Vol-3186/paper_10.pdf) -* (6/2022) **Demonstration of Accelerating Machine Learning Inference Queries with Correlative Proxy Models** - Zhihui Yang, Yicong Huang, Zuozhi Wang, Feng Gao, Yao Lu, Chen Li, and X. Sean Wang - _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p3734-yang.pdf) -* (6/2022) **Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in Texera** - Xiaozhen Liu, Zuozhi Wang, Shengquan Ni, Sadeem Alsudais, Yicong Huang, Avinash Kumar, and Chen Li - _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p3738-liu.pdf) | [Demo Video](https://youtu.be/2gfPUZNsoBs) -* (4/2022) **Optimizing Machine Learning Inference Queries with Correlative Proxy Models** - Zhihui Yang, Zuozhi Wang, Yicong Huang, Yao Lu, Chen Li, and X. Sean Wang - _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p2032-yang.pdf) -* (7/2020) **Demonstration of Interactive Runtime Debugging of Distributed Dataflows in Texera** - Zuozhi Wang, Avinash Kumar, Shengquan Ni, and Chen Li - _In VLDB 2020_ | [PDF](http://www.vldb.org/pvldb/vol13/p2953-wang.pdf) | [Video](https://www.youtube.com/watch?v=SP-XiDADbw0) | [Slides](https://docs.google.com/presentation/d/14U6RPZfeb8Ho0aO2HsCSc8lRs6ul6AxEIm5gpjeVUYA/edit?usp=sharing) -* (1/2020) **Amber: A Debuggable Dataflow system based on the Actor Model** - Avinash Kumar, Zuozhi Wang, Shengquan Ni, and Chen Li - _In VLDB 2020_ | [PDF](http://www.vldb.org/pvldb/vol13/p740-kumar.pdf) | [Video](https://www.youtube.com/watch?v=T5ShFRfHmgI) | [Slides](https://docs.google.com/presentation/d/1v8G9lDmfv4Ff2YWyrGfo_9iMQVF4N8a-4gO4H-K6rCk/edit?usp=sharing) -* (4/2017) **A Demonstration of TextDB: Declarative and Scalable Text Analytics on Large Data Sets** - Zuozhi Wang, Flavio Bayer, Seungjin Lee, Kishore Narendran, Xuxi Pan, Qing Tang, Jimmy Wang, and Chen Li - _In ICDE 2017_ **Best Demo award** | [PDF](https://chenli.ics.uci.edu/files/icde2017-textdb-demo.pdf) | [Video](https://github.com/Texera/texera/wiki/Video) - -</details> +Apache Texera (Incubating) is an open-source system for human-AI collaborative data science using visual workflows. It enables analysts to construct, execute, and refine data analysis tasks through an intuitive GUI, assisted by AI agents that understand natural-language instructions. Texera is well suited for a wide range of applications, including “AI for Science,” by making advanced AI and data science capabilities accessible to a broader community. It can run on a laptop for local use or be deployed in the cloud to support scalable processing of large datasets. Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
