Re: Discussion : Propose CarbonData project to consider as AI-native data storage

Liang Chen Thu, 21 Aug 2025 04:24:08 -0700

For AI, require multi type high efficient storage.

Liang Chen <chenliang...@apache.org> 于2025年8月21日周四 10:41写道：


> Dear Dev
>
> I propose the CarbonData project to consider as AI-native data storage,
> the new journey is super suitable for carbondata.
>
> What is AI-native data storage
>
> AI-native data storage is a data storage and management system designed
> and built specifically for the needs of artificial intelligence (AI)
> workloads, particularly machine learning and deep learning. Its core
> concept is to transform data storage from a passive, isolated component of
> the AI process into an active, intelligent, and deeply integrated
> infrastructure.
> Why AI-native data storage for CarbonData's new scope
>
> In AI projects, data scientists and engineers spend 80% of their time on
> data preparation. Traditional storage presents numerous bottlenecks in this
> process:
>
> Data silos: Training data may be scattered across data lakes, data
> warehouses, file systems, object storage, and other locations, making
> integration difficult.
>
> Performance bottlenecks:
>
> Training phase: High-speed, low-latency data throughput is required to
> feed GPUs to avoid expensive GPU resources sitting idle.
>
> Inference phase: High-concurrency, low-latency vector similarity search
> capabilities are required.
>
> Complex data formats: AI processes data types far beyond tables, including
> unstructured data (images, videos, text, audio) and semi-structured data
> (JSON, XML). Traditional databases have limited capabilities for processing
> and querying such data.
>
> Lack of metadata management: The lack of effective management of rich
> metadata such as data versions, lineage, annotation information, and
> experimental parameters leads to poor experimental reproducibility.
>
> Vectorization requirements: Modern AI models (such as large language
> models) convert all data into vector embeddings. Traditional storage cannot
> efficiently store and retrieve high-dimensional vectors.
>
>
> Regards
>
> Liang
>

Re: Discussion : Propose CarbonData project to consider as AI-native data storage

Reply via email to