yuebaiv commented on issue #9718: URL: https://github.com/apache/seatunnel/issues/9718#issuecomment-3809493077
Hi @Hisoka-X and maintainers, I'm interested in contributing to this RAG documentation task. I have experience with data integration and RAG applications, and I'd like to write a comprehensive guide for using SeaTunnel in RAG data pipelines. ## Proposed Documentation Outline I plan to create a detailed Chinese document (`docs/zh/introduction/rag-data-processing.md`) covering: ### 1. **RAG Fundamentals** - What is RAG and why it matters - Complete data processing pipeline overview - SeaTunnel's role in the RAG workflow ### 2. **5 Real-World Scenarios with Full Configurations** - **Scenario 1**: Enterprise document knowledge base (PDF/DOCX/MD → Milvus) - **Scenario 2**: Database vectorization (MySQL → Qdrant) - **Scenario 3**: Real-time document sync (CDC → Vector DB) - **Scenario 4**: Multi-source data fusion (Web + Docs + DB) - **Scenario 5**: Large-scale optimization with Lance format ### 3. **Advanced Topics** - Hybrid indexing (dense + sparse vectors) - Metadata filtering acceleration - Document chunking best practices - Embedding model selection guide ### 4. **Performance Optimization** - Batch processing strategies - Parallelism tuning - Connection pooling - Monitoring with Prometheus metrics ### 5. **Complete Example Project** - Docker Compose setup (SeaTunnel + Milvus + Qdrant + Embedding service) - Ready-to-run configurations - Testing and validation scripts ### 6. **FAQ and Troubleshooting** - Handling large files - API rate limiting - Vector updates (upsert mode) - Query optimization ## Why This Documentation is Valuable 1. **Practical**: 5 complete configurations that users can run immediately 2. **Comprehensive**: Covers the entire pipeline from data ingestion to vector storage 3. **Modern**: Focuses on the hot topic of RAG and LLM applications 4. **Bilingual Ready**: Starting with Chinese, can be translated to English later ## Timeline I can complete this documentation within **2 weeks** (by February 11, 2026) and submit a PR. Could you please assign this issue to me? I'm excited to contribute to the SeaTunnel community! **GitHub**: @yuebaiv Looking forward to your feedback! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
