yuebaiv commented on issue #9718:
URL: https://github.com/apache/seatunnel/issues/9718#issuecomment-3809493077

   Hi @Hisoka-X and maintainers,
   
   I'm interested in contributing to this RAG documentation task. I have 
experience with data integration and RAG applications, and I'd like to write a 
comprehensive guide for using SeaTunnel in RAG data pipelines.
   
   ## Proposed Documentation Outline
   
   I plan to create a detailed Chinese document 
(`docs/zh/introduction/rag-data-processing.md`) covering:
   
   ### 1. **RAG Fundamentals**
   - What is RAG and why it matters
   - Complete data processing pipeline overview
   - SeaTunnel's role in the RAG workflow
   
   ### 2. **5 Real-World Scenarios with Full Configurations**
   - **Scenario 1**: Enterprise document knowledge base (PDF/DOCX/MD → Milvus)
   - **Scenario 2**: Database vectorization (MySQL → Qdrant)
   - **Scenario 3**: Real-time document sync (CDC → Vector DB)
   - **Scenario 4**: Multi-source data fusion (Web + Docs + DB)
   - **Scenario 5**: Large-scale optimization with Lance format
   
   ### 3. **Advanced Topics**
   - Hybrid indexing (dense + sparse vectors)
   - Metadata filtering acceleration
   - Document chunking best practices
   - Embedding model selection guide
   
   ### 4. **Performance Optimization**
   - Batch processing strategies
   - Parallelism tuning
   - Connection pooling
   - Monitoring with Prometheus metrics
   
   ### 5. **Complete Example Project**
   - Docker Compose setup (SeaTunnel + Milvus + Qdrant + Embedding service)
   - Ready-to-run configurations
   - Testing and validation scripts
   
   ### 6. **FAQ and Troubleshooting**
   - Handling large files
   - API rate limiting
   - Vector updates (upsert mode)
   - Query optimization
   
   ## Why This Documentation is Valuable
   
   1. **Practical**: 5 complete configurations that users can run immediately
   2. **Comprehensive**: Covers the entire pipeline from data ingestion to 
vector storage
   3. **Modern**: Focuses on the hot topic of RAG and LLM applications
   4. **Bilingual Ready**: Starting with Chinese, can be translated to English 
later
   
   ## Timeline
   
   I can complete this documentation within **2 weeks** (by February 11, 2026) 
and submit a PR.
   
   Could you please assign this issue to me? I'm excited to contribute to the 
SeaTunnel community!
   
   **GitHub**: @yuebaiv
   
   Looking forward to your feedback!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to