GitHub user SYaoJun edited a discussion: Proposal: Introduce Vortex Columnar 
Format Support in GraphAr

## Background
Currently, several emerging columnar file formats—such as 
[Vortex](https://github.com/vortex-data/vortex), 
[Lance](https://github.com/lance-format/lance), 
[F3](https://github.com/future-file-format/F3), BtrBlocks, Nimble, and Parquet 
variants—demonstrate strong performance advantages in specific scenarios.

I wonder whether supporting these formats in GraphAr could significantly reduce 
storage overhead and improve query performance at scale.

## Benefits
1. Introducing the Vortex columnar format can improve storage efficiency and 
query performance through better compression and vectorized execution.
2. It enables more flexible column-level encoding strategies, which can better 
align with analytical graph workloads.
3. Vortex is designed to be GPU-friendly, particularly in AI and analytics 
scenarios.


## Effects of Modifications
1. Storage layer implementation and format adapters
2. All binding languages require adoption.

```shell
enum class FileType : int32_t { CSV = 0, PARQUET = 1, ORC = 2, JSON = 3 };
```
## Evidence from DuckDB
Vortex has already been integrated into DuckDB, where it demonstrates 
substantial performance improvements on analytical workloads such as TPC-H. 
Reported results show significant gains in scan efficiency and query execution 
time compared to traditional columnar formats. detail in this 
[blog](https://duckdb.org/2026/01/23/duckdb-vortex-extension).


What do others think about this idea?

GitHub link: https://github.com/apache/incubator-graphar/discussions/887

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to