Re: [I] [DISCUSSION][java] Architecture Discussion for Java Repository [incubator-graphar]

via GitHub Tue, 16 Sep 2025 03:03:24 -0700


SemyonSinchenko commented on issue #756:
URL: 
https://github.com/apache/incubator-graphar/issues/756#issuecomment-3297328060


   > I have some questions: 
   > Q1: Is the I/O granularity at the chunk level?
   > For example, in `java-io-parquet`, is each I/O operation reading one 
entire chunk, and then filtering and computation within the chunk are done in 
memory?
   
   In that case we lost all the benfits of GAR (except better compression). 
Because GAR data is sorted by IDs and we have also index table, we can befit of 
it doing pushdows and in case, for example, user wants to read only 2-hop 
neighborhoud of small subset of IDs in the Graph, we must check first index 
table and min-max statistics per column stored in headers of parquet files to 
skip most of chunks. Reading and filtering in memory sounds crazy for me, like 
why?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [DISCUSSION][java] Architecture Discussion for Java Repository [incubator-graphar]

Reply via email to