[I] Always use hash joins when joining VALUES blocks [jena]

via GitHub Wed, 12 Jun 2024 05:49:22 -0700


Aklakan opened a new issue, #2535:
URL: https://github.com/apache/jena/issues/2535


   ### Version
   
   5.1.0-SNAPSHOT
   
   ### Feature
   
   This is one more follow up towards support for multi variable join keys as 
triggered by the mail thread around 
https://www.mail-archive.com/[email protected]/msg20755.html
   
   The mail mentions example sparql queries, such as the one below, that use 
value blocks to exemplify the issue with multi-variable joins.
   
   ```sparql
   # test.rq
   select (count(*) as ?C) where {
     {
       select ?X ?Y (struuid() as ?UUID) where {
         values ?X_i { 0 1 2 3 4 5 6 7 8 9 }
         values ?X_j { 0 1 2 3 4 5 6 7 8 9 }
         bind ( ?X_i + 10 * ?X_j as ?X)
         values ?Y_i { 0 1 2 3 4 5 6 7 8 9 }
         values ?Y_j { 0 1 2 3 4 5 6 7 8 9 }
         bind ( ?Y_i + 10 * ?Y_j as ?Y) 
       } 
     } {
       select ?X ?Y where {
         {
           select ?X ?Y (rand() as ?RAND) where {
             values ?X_i { 0 1 2 3 4 5 6 7 8 9 }
             values ?X_j { 0 1 2 3 4 5 6 7 8 9 }
             bind ( ?X_i + 10 * ?X_j as ?X)
             values ?Y_i { 0 1 2 3 4 5 6 7 8 9 }
             values ?Y_j { 0 1 2 3 4 5 6 7 8 9 }
             bind ( ?Y_i + 10 * ?Y_j as ?Y) 
           } 
         } filter (?RAND < 0.95) 
       } 
     } 
   }
   ```
   Jena's JoinClassifier so far linearizes joins between values blocks such as 
the SPARQL example below which gets very slow for larger values blocks.
   An extra flag is necessary to force hash joins:
   
   ```bash
   arq --explain --time --set arq:optIndexJoinStrategy=false --query test.rq
   ```
   
   The goal of this issue is to modify JoinClassifier such that joins between 
table-based operands are processed as hash joins rather than linear joins.
   
   
   ### Are you interested in contributing a solution yourself?
   
   Yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Always use hash joins when joining VALUES blocks [jena]

Reply via email to