This is an automated email from the ASF dual-hosted git repository.
bobbai00 pushed a commit to branch release/v1.1.0-incubating
in repository https://gitbox.apache.org/repos/asf/texera.git
The following commit(s) were added to refs/heads/release/v1.1.0-incubating by
this push:
new f081229222 chore: revising README.md and the About page (#4591)
f081229222 is described below
commit f081229222cb6334a29d131cb13adde2dfa79119
Author: Jiadong Bai <[email protected]>
AuthorDate: Thu Apr 30 22:25:06 2026 -0700
chore: revising README.md and the About page (#4591)
### What changes were proposed in this PR?
This is a backport of #4558 from `main` onto
`release/v1.1.0-incubating`.
### Any related issues, documentation, discussions?
Related to #4558
### How was this PR tested?
### Was this PR authored or co-authored using generative AI tooling?
Co-authored-by: Chen Li <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
---
README.md | 149 +++++----------------------------------------
docs/system-screenshot.png | Bin 0 -> 1856289 bytes
2 files changed, 15 insertions(+), 134 deletions(-)
diff --git a/README.md b/README.md
index 6f2f92e818..54dfddb28e 100644
--- a/README.md
+++ b/README.md
@@ -1,23 +1,19 @@
-<h1 align="center">Apache Texera - Collaborative Data Science and AI/ML Using
Workflows</h1>
+<h1 align="center">Apache Texera - Human-AI Collaborative Data Science Using
Visual Workflows</h1>
<p align="center">
<a href="https://texera.io"> <img
src="frontend/src/assets/logos/full_logo_small.png" alt="texera-logo"
height="150px"/> </a>
<br>
- <i>Apache Texera (Incubating) supports scalable data computation and enables
advanced AI/ML techniques.</i>
- <br>
- <i>"Collaboration" is a key focus, and we enable an experience similar to
Google Docs, but for data science. </i>
+ <i>Apache Texera (Incubating) is an open-source platform for human-AI
collaborative data science using visual workflows.</i>
<br>
<h4 align="center">
- <a href="https://texera.io">Official Site</a>
- |
- <a href="https://texera.io/publications/">Publications</a>
+ <a href="https://texera.apache.org/">Official Site</a>
|
<a href="https://texera.io/category/video/">Video</a>
+ |
+ <a href="https://texera.io/publications/">Publications</a>
|
<a href="https://texera.io/category/blog/">Blog</a>
- |
- <a href="https://github.com/Texera/texera/wiki/Getting-Started">Getting
Started</a>
<br>
</h4>
@@ -33,136 +29,21 @@
<img alt="Static Badge"
src="https://img.shields.io/badge/Largest_Deployment-100_nodes,_400_cores-green">
</p>
-# Goals
-
-* Provide data science as cloud services;
-* Provide a browser-based GUI to form a workflow without writing code;
-* Allow non-IT people to access data science;
-* Support collaborative data science;
-* Allow users to interact with the execution of a job;
-* Support huge volumes of data efficiently.
-
-# Workflow GUI
-The Texera interface supports real-time collaboration on data science
projects, allowing seamless sharing of data and workflows with easy access to
AI/ML techniques and efficient management of public and private resources.
-The workflow in the use case shown below includes data cleaning, ML model
training, and validation.
-
-
-# Publications (Computer Science)
-* (5/2025) **Responsive Retrieval of Consistent States in Pipelined
Executions of Dataflows**
- Shengquan Ni, and Chen Li
- _To appear in HILDA Workshop at SIGMOD 2025_
-* (11/2024) **IcedTea: Efficient and Responsive Time-Travel Debugging in
Dataflow Systems**
- Shengquan Ni, Yicong Huang, Zuozhi Wang, and Chen Li
- _To appear in VLDB 2025_
-* (8/2024) **Pasta: A Cost-Based Optimizer for Generating Pipelining
Schedules for Dataflow DAGs**
- Xiaozhen Liu, Yicong Huang, Xinyuan Lin, Avinash Kumar, Sadeem Alsudais,
and Chen Li
- _To appear in SIGMOD 2025_
-* (7/2024) **Texera: A System for Collaborative and Interactive Data
Analytics Using Workflows**
- Zuozhi Wang, Yicong Huang, Shengquan Ni, Avinash Kumar, Sadeem Alsudais,
Xiaozhen Liu, Xinyuan Lin, Yunyan Ding, and Chen Li
- _In VLDB 2024, Scalable Data Science track_ |
[PDF](https://www.vldb.org/pvldb/vol17/p3580-wang.pdf) |
[Slides](https://chenli.ics.uci.edu/files/vldb2024-texera-presentation.pdf)
-* (3/2024) **Demonstration of Udon: Line-by-line Debugging of User-Defined
Functions in Data Workflows**
- Yicong Huang, Zuozhi Wang, and Chen Li
- _In SIGMOD 2024 **Best Demo Runner-Up Award🏆**_ |
[PDF](https://dl.acm.org/doi/10.1145/3626246.3654756)
-* (2/2024) **Data Science Tasks Implemented with Scripts versus GUI-Based
Workflows:** The Good, the Bad, and the Ugly
- Alexander K Taylor, Yicong Huang, Junheng Hao, Xinyuan Lin, Xiusi Chen, Wei
Wang, and Chen Li
- _In DataPlat Workshop at ICDE 2024_ |
[PDF](https://ieeexplore.ieee.org/abstract/document/10555112) |
[Slides](https://chenli.ics.uci.edu/files/icde2024-dataplat-workshop.pdf)
-<details>
-<summary>Expand All</summary>
-
-* (8/2023) **Building a Collaborative Data Analytics System: Opportunities and
Challenges**
- Zuozhi Wang, Chen Li
- _In Tutorial at VLDB 2023_ |
[PDF](https://www.vldb.org/pvldb/vol16/p3898-wang.pdf) |
[Slides](https://chenli.ics.uci.edu/files/vldb2023-texera-tutorial.pdf)
-* (8/2023) **Udon: Efficient Debugging of User-Defined Functions in Big Data
Systems with Line-by-Line Control**
- Yicong Huang, Zuozhi Wang, and Chen Li
- _In SIGMOD 2024_ | [PDF](https://dl.acm.org/doi/10.1145/3626712) |
[Slides](https://chenli.ics.uci.edu/files/sigmod2024-udon-presentation.pdf)
-* (8/2023) **Improving Iterative Analytics in GUI-Based Data-Processing
Systems with Visualization, Version Control, and Result Reuse**
- Sadeem Alsudais Ph.D. Thesis |
[PDF](https://sadeemsaleh.github.io/Sadeem_phd_thesis.pdf)
-* (7/2023) **Using Texera to Characterize Climate Change Discussions on
Twitter During Wildfires**
- Shengquan Ni, Yicong Huang, Jessie W. Y. Ko, Alexander Taylor, Xiusi Chen,
Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, Suellen
Hopfer, and Chen Li
- _In Data Science Day at KDD 2023_
-* (7/2023) **Raven: Accelerating Execution of Iterative Data Analytics by
Reusing Results of Previous Equivalent Versions**
- Sadeem Alsudais, Avinash Kumar, and Chen Li
- _In HILDA Workshop at SIGMOD 2023_ |
[PDF](https://dl.acm.org/doi/10.1145/3597465.3605219)
-* (6/2023) **Texera: A System for Collaborative and Interactive Data Analytics
Using Workflows**
- Zuozhi Wang Ph.D. Thesis |
[PDF](https://zuozhiw.github.io/Zuozhi_Wang_UCI_PhD_Thesis.pdf)
-* (12/2022) **Towards Interactive, Adaptive and Result-aware Big Data
Analytics**
- Avinash Kumar Ph.D. Thesis | [PDF](https://arxiv.org/abs/2212.07096)
-* (9/2022) **Fries: Fast and Consistent Runtime Reconfiguration in Dataflow
Systems with Transactional Guarantees**
- Zuozhi Wang, Shengquan Ni, Avinash Kumar, and Chen Li
- _In VLDB 2023_ | [PDF](https://www.vldb.org/pvldb/vol16/p256-wang.pdf) |
[Slides](https://chenli.ics.uci.edu/files/vldb2023-fries.pdf)
-* (7/2022) **Drove: Tracking Execution Results of Workflows on Large
Datasets**
- Sadeem Alsudais
- _In the Ph.D. Workshop at VLDB 2022_ |
[PDF](http://ceur-ws.org/Vol-3186/paper_10.pdf)
-* (6/2022) **Demonstration of Accelerating Machine Learning Inference Queries
with Correlative Proxy Models**
- Zhihui Yang, Yicong Huang, Zuozhi Wang, Feng Gao, Yao Lu, Chen Li, and X.
Sean Wang
- _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p3734-yang.pdf)
-* (6/2022) **Demonstration of Collaborative and Interactive Workflow-Based
Data Analytics in Texera**
- Xiaozhen Liu, Zuozhi Wang, Shengquan Ni, Sadeem Alsudais, Yicong Huang,
Avinash Kumar, and Chen Li
- _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p3738-liu.pdf) |
[Demo Video](https://youtu.be/2gfPUZNsoBs)
-* (4/2022) **Optimizing Machine Learning Inference Queries with Correlative
Proxy Models**
- Zhihui Yang, Zuozhi Wang, Yicong Huang, Yao Lu, Chen Li, and X. Sean Wang
- _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p2032-yang.pdf)
-* (7/2020) **Demonstration of Interactive Runtime Debugging of Distributed
Dataflows in Texera**
- Zuozhi Wang, Avinash Kumar, Shengquan Ni, and Chen Li
- _In VLDB 2020_ | [PDF](http://www.vldb.org/pvldb/vol13/p2953-wang.pdf) |
[Video](https://www.youtube.com/watch?v=SP-XiDADbw0) |
[Slides](https://docs.google.com/presentation/d/14U6RPZfeb8Ho0aO2HsCSc8lRs6ul6AxEIm5gpjeVUYA/edit?usp=sharing)
-* (1/2020) **Amber: A Debuggable Dataflow system based on the Actor Model**
- Avinash Kumar, Zuozhi Wang, Shengquan Ni, and Chen Li
- _In VLDB 2020_ | [PDF](http://www.vldb.org/pvldb/vol13/p740-kumar.pdf) |
[Video](https://www.youtube.com/watch?v=T5ShFRfHmgI) |
[Slides](https://docs.google.com/presentation/d/1v8G9lDmfv4Ff2YWyrGfo_9iMQVF4N8a-4gO4H-K6rCk/edit?usp=sharing)
-* (4/2017) **A Demonstration of TextDB: Declarative and Scalable Text
Analytics on Large Data Sets**
- Zuozhi Wang, Flavio Bayer, Seungjin Lee, Kishore Narendran, Xuxi Pan, Qing
Tang, Jimmy Wang, and Chen Li
- _In ICDE 2017_ **Best Demo award** |
[PDF](https://chenli.ics.uci.edu/files/icde2017-textdb-demo.pdf) |
[Video](https://github.com/Texera/texera/wiki/Video)
-
-</details>
-
-# Publications (Interdisciplinary):
-* (2/2025) **DS4ALL: Teaching High-School Students Data Science and AI/ML
Using the Texera Workflow Platform as a Service**
- Jiadong Bai, Xiaozhen Liu, Anthony Cuturrufo, Alexander Kundu Taylor,
Jeehyun Hwang, Mingyu Derek Ma, Xinyuan Lin, Yanqiao Zhu, Yicong Huang, Yunyan
Ding, Wei Wang, and Chen Li
- _To appear in [Data Science Education K-12: Research to Practice Annual
Conference
2025](https://web.cvent.com/event/d641bd9f-6c99-4cbc-951b-33b1ca05d4ed/summary)_
-* (7/2024) **Brain Image Data Processing Using Collaborative Data Workflows on
Texera**
- Yunyan Ding, Yicong Huang, Pan Gao, Andy Thai, Atchuth Naveen
Chilaparasetti, M. Gopi, Xiangmin Xu, and Chen Li
- _In Frontiers Neural Circuits_ |
[PDF](https://doi.org/10.3389/fncir.2024.1398884)
-* (1/2024) **Wording Matters: The Effect of Linguistic Characteristics and
Political Ideology on Resharing of COVID-19 Vaccine Tweets**
- Judith Borghouts, Yicong Huang, Suellen Hopfer, Chen Li, and Gloria Mark
- _In TOCHI 2024_ | [PDF](https://dl.acm.org/doi/pdf/10.1145/3637876)
-* (1/2024) **How the Experience of California Wildfires Shape Twitter Climate
Change Framings**
- Jessie W. Y. Ko, Shengquan Ni, Alexander Taylor, Xiusi Chen, Yicong Huang,
Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, Chen Li,
and Suellen Hopfer
- _In Climatic Change 2024_ |
[PDF](https://link.springer.com/content/pdf/10.1007/s10584-023-03668-0.pdf)
-* (11/2023) **The Marketing and Perceptions of Non-Tobacco Blunt Wraps on
Twitter**
- Joshua U. Rhee, Yicong Huang, Aurash J. Soroosh, Sadeem Alsudais, Shengquan
Ni, Avinash Kumar, Jacob Paredes, Chen Li, and David S. Timberlake
- _In Substance Use & Misuse 2023_ |
[PDF](https://www.tandfonline.com/doi/epdf/10.1080/10826084.2023.2280572?needAccess=true)
-
-<details>
-<summary>Expand All</summary>
-
-* (3/2023) **Understanding Underlying Moral Values and Language Use of
COVID-19 Vaccine Attitudes on Twitter**
- Judith Borghouts, Yicong Huang, Sydney Gibbs, Suellen Hopfer, Chen Li, and
Gloria Mark
- _In PNAS Nexus 2023_ |
[PDF](https://academic.oup.com/pnasnexus/article-pdf/2/3/pgad013/49435858/pgad013.pdf)
-* (10/2022) **Public Opinions Toward COVID-19 Vaccine Mandates: A Machine
Learning-Based Analysis of U.S. Tweets**
- Yawen Guo, Jun Zhu, Yicong Huang, Lu He, Changyang He, Chen Li, and Kai Zheng
- _In AMIA 2022_ |
[PDF](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10148373/pdf/1066.pdf)
-* (9/2021) **The Social Amplification and Attenuation of COVID-19 Risk
Perception Shaping Mask-Wearing Behavior: A Longitudinal Twitter Analysis**
- Suellen Hopfer, Emilia J. Fields, Yuwen Lu, Ganesh Ramakrishnan, Ted Grover,
Quishi Bai, Yicong Huang, Chen Li, and Gloria Mark
- _In PLOS ONE 2021_ |
[PDF](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0257428)
-* (4/2021) **Why Do People Oppose Mask Wearing? A Comprehensive Analysis of
U.S. Tweets During the COVID-19 Pandemic**
- Lu He, Changyang He, Tera Leigh Reynolds, Qiushi Bai, Yicong Huang, Chen Li,
Kai Zheng, and Yunan Chen
- _In JAMIA 2021_ |
[PDF](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7989302/pdf/ocab047.pdf)
-</details>
-
-# Getting Started
-
-* For users, visit [Guide to Use
Texera](https://github.com/Texera/texera/wiki/Getting-Started).
-* For developers, visit [Guide to Develop
Texera](https://github.com/Texera/texera/wiki/Guide-for-Developers).
-
-Texera was formally known as "TextDB" before August 28, 2017.
+Apache Texera (Incubating) is an open-source platform for human-AI
collaborative data science using visual workflows. It enables human analysts to
construct, execute, and refine data analysis tasks through an intuitive GUI,
assisted by AI agents that understand natural-language instructions. Texera is
well suited for a wide range of applications, including “AI for Science,” by
making advanced AI and data science capabilities accessible to a broader
community. It can run on a laptop for l [...]
-# Acknowledgements
+The platform has the following key features:
-This project is supported by the <a href="http://www.nsf.gov">National Science
Foundation</a> under the awards
[IIS-1745673](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1745673),
[IIS-2107150](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2107150), AWS
Research Credits, and Google Cloud Platform Education Programs.
+* Natural-language data science through AI agents
+* Intuitive GUI-based workflows for data science
+* Real-time collaboration for workflow editing and execution
+* Runtime debugging and interactive workflow execution
+* Language-agnostic workflow runtime, native support for Python and Java
+* Parallel backend engine for scalable big-data processing
+* Separation of compute and storage for flexible cloud deployment
-* <a href="https://www.niddk.nih.gov/"><img
src="https://github.com/Texera/texera/assets/17627829/d279897a-3efb-41c1-b2d3-8fd20c800ad7"
alt="NIH NIDDK" height="30"/></a> This project is supported by an <a
href="https://reporter.nih.gov/project-details/10818244">NIH NIDDK</a> award.
+
-* <a href="http://www.yourkit.com"><img
src="https://www.yourkit.com/images/yklogo.png" alt="Yourkit" height="30"/></a>
[Yourkit](https://www.yourkit.com/) has given an open source license to use
their profiler in this project.
# Citation
Please cite Texera as
diff --git a/docs/system-screenshot.png b/docs/system-screenshot.png
new file mode 100644
index 0000000000..6abd0597b3
Binary files /dev/null and b/docs/system-screenshot.png differ