Hi folks, I have updated our Roadmap wiki page adding some items that can bring some value to the entire project. ManifoldCF in this way could become an AI-Ready Data Ingestion Hub: https://cwiki.apache.org/confluence/display/CONNECTORS/Roadmap
The discussion is open and please share your feedback and feel free to raise your hand to eventually take the ownership of some of those tasks summarized below: Core Performance & Modernization (The Java 21 Leap) The transition to OpenJDK 21 is the foundation for a more scalable and responsive architecture: - Virtual Threads Integration (Project Loom) - REST API v2 & OpenAPI Specification - Enable "Configuration-as-Code" to support modern DevOps workflows - Observability with OpenTelemetry AI & Vector Ecosystem Integration (RAG-Readiness) Positioning ManifoldCF as the primary "ingestion engine" for Retrieval-Augmented Generation (RAG) and LLM applications: - Universal Embedding Transformation Connector (High Priority) - Enable in-flight embedding generation (converting text to vectors) directly within the MCF pipeline at no cost using local open-source models (e.g., BGE-M3, Nomic). - Native Vector Store Output Connectors: Solr Dense Vector, Milvus, Qdrant, and Weaviate - Develop a specialized pgvector connector for users leveraging PostgreSQL as a unified metadata and vector store (High Priority) Advanced Metadata & ACL Mapping for AI - Ensure that security permissions (ACLs) are seamlessly passed to vector stores as "payload" filters to maintain document security in AI search interfaces. Cloud-Native & Ecosystem Synergy Expanding the reach of ManifoldCF through deeper integration with the Apache ecosystem and containerized environments: - Apache Solr Dense Vector Output Connector (High Priority) - Apache Airflow & NiFi Integration - Kubernetes Operator - Next-Gen Administrative UI -- Piergiorgio
