[GitHub] [arrow-datafusion] alippai commented on issue #451: Add Linked data benchmarks

GitBox Mon, 31 May 2021 13:29:16 -0700


alippai commented on issue #451:
URL: 
https://github.com/apache/arrow-datafusion/issues/451#issuecomment-851672062



   In this case LSQB sounds to be a better first target. 👍 
   
   > So also into what vectorized engines (can) do here.
   I have a bad experience with dedicated "graph engines", usually a PostgreSQL 
or SQL Server based solution beats any dedicated solution out there, so I 
wouldn't be afraid that DataFusion's architecture is not fully exploited. 
Similarly Differential Dataflow/Materialize or a naive rust/c++ implementation 
traversing the data is ridiculously faster so there is a chance that Arrow's 
memory model and parallel joins help. Still, adding benchmarks measuring 
recursive CTE might side-track the main DataFusion development, I acknowledge 
that. My gut feeling is that DataFusion would perform these queries relatively 
well as they would work as "repeated high selectivity, high cardinality joins" 
and as far as I remember we are not particularly bad at that. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alippai commented on issue #451: Add Linked data benchmarks

Reply via email to