kaxil commented on PR #42376: URL: https://github.com/apache/airflow/pull/42376#issuecomment-2448602088
This sentence is the key there: "**_For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on x alone._** " How much slower? That depends on the dataset and access pattern. In above, a multi-column index on (x, y) technically covers queries on x alone but the additional storage and scanning costs can impact performance, especially for high-traffic columns. A single-column index on x is smaller and offers a narrower scan range, which often improves I/O and cache usage, making a measurable difference in efficiency for large datasets. Again how much difference depends on the dataset and query patterns -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
