SYaoJun opened a new issue, #860:
URL: https://github.com/apache/incubator-graphar/issues/860

   ### Describe the enhancement requested
   
   ## Background
   
   <img width="750" height="335" alt="Image" 
src="https://github.com/user-attachments/assets/51fdcba8-f3de-47ae-bf6f-0ab0ec7a8e6f";
 />
   
   Currently, all Arrow chunk readers (
   VertexPropertyArrowChunkReader, AdjListArrowChunkReader, 
AdjListOffsetArrowChunkReader, AdjListPropertyArrowChunkReader) discard the 
loaded chunk_table_ every time the chunk position changes via seek(), 
next_chunk(), or seek_chunk_index(). This means that if a user seeks back to a 
previously loaded chunk, the entire Parquet file must be re-opened, metadata 
parsed, and data decoded again — even though the data hasn't changed.
   
   This is particularly costly in graph traversal workloads (BFS, PageRank, 
label filtering) where vertex/edge access patterns exhibit strong locality, 
causing the same chunks to be read repeatedly.
   ## Proposal
   Introduce a generic 
   LruCache<Key, Value> and integrate it into all four chunk reader classes. 
When a chunk is loaded from disk, it is stored in the cache. On subsequent 
seeks to the same chunk, the cached arrow::Table is returned directly, avoiding 
file I/O entirely.
   
   ## Benchmark Results
   With a capacity-4 LRU cache on the LDBC sample dataset (Release build, macOS 
ARM):
   
   <img width="838" height="167" alt="Image" 
src="https://github.com/user-attachments/assets/50014456-edc3-4b4d-a763-0b5927f0b400";
 />
   
   ## TODO
   - Integrate LruCache<IdType, shared_ptr> into VertexPropertyArrowChunkReader
   - Integrate LruCache<pair<IdType,IdType>, shared_ptr, PairHash> into 
AdjListArrowChunkReader
   - ...
   
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to