metesynnada commented on PR #7366:
URL: 
https://github.com/apache/arrow-datafusion/pull/7366#issuecomment-1689863030

   I created a small benchmark for streaming using tpch data.
   
   ```
   Benchmark streaming.json
   --------------------
   ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ Query        ┃ apache_main ┃ upstream_prunable-hash-join ┃        Change ┃
   ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
   │ QQuery 1     │   1483.13ms │                   1483.27ms │     no change │
   │ QQuery 2     │  11033.15ms │                   6903.41ms │ +1.60x faster │
   └──────────────┴─────────────┴─────────────────────────────┴───────────────┘
   ```
   
   First query is 
   ```sql
   SELECT
       o_orderkey
   FROM
       orders,
       lineitem
   WHERE
     o_orderdate = l_shipdate
     AND l_orderkey >= o_orderkey - 10
     AND l_orderkey < o_orderkey + 10
     AND l_returnflag = 'R'
   ```
   and the second one is 
   ```sql
   SELECT
       o_orderkey
   FROM
       orders,
       lineitem
   WHERE
           o_orderstatus = l_linestatus
     AND l_orderkey >= o_orderkey - 10
     AND l_orderkey < o_orderkey + 10
     AND l_returnflag = 'R'
       LIMIT 10000;
   ```
   The second query involves key pairs with low cardinality. While `smallvec` 
was effective in allocating new keys, deleting from it resulted in performance 
issues. With the removal of the `smallvec` mechanism in this PR, we have 
significantly improved performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to