karlovnv commented on issue #12454: URL: https://github.com/apache/datafusion/issues/12454#issuecomment-2355728856
We had a discussion about join of huge table with small table here: https://github.com/apache/datafusion/issues/7000#issuecomment-2094813305 There are several approaches discussed: 1. Use Dictionaries (like in Clickhouse, [details](https://clickhouse.com/docs/en/sql-reference/dictionaries)) with dictionary api (UDF like `get_dict('users', user_id)`) > ClickHouse supports special functions for working with dictionaries that can be used in queries. It is easier and more efficient to use dictionaries with functions than a JOIN with reference tables. 2. Use row-based layout for small table with btree/hash index for getting data by a key while joining (in poll method) This may lead us to create a special type of tables like RowBasedTable with special RowBasedTableProvider, which can give an access to data using hash-based indices or iterator for btree ones. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
