xinzweb commented on issue #217: URL: https://github.com/apache/cloudberry-site/issues/217#issuecomment-2551540521
Few ideas here based on experiences: * Use ORCA and encourage hash join and hash aggregate for OLAP queries * Use "star", don't use "snowflake" to reduce the number of joins, denormalize your data to change join into filters * Configuration, this deserve its own book * Partition your data, so that you don't delete, just drop the entire partition * Outer join of 10+ tables? well, answered in the first one, rely on ORCA, or answered in the second one, use star schema to reduce the number of joins * Unless you do singleton lookup, don't index. Views are only for management purpose, not for performance. Materialized views would be a great idea if you have a relative fixed workloads. For the partition, make your partition medium sized, and leverage the static and dynamic partition elimination. Again, those are very high level ideas to optimize for "append-only, truncate-only, mostly read-only" OLAP workloads. If you are thinking OLTP or any HTAP workloads, the above recommendation will not work. I agree with @tuhaihe that, we need to have extended documentation with general guidance as well as real world cases to demonstrate the tradeoff for the guidance above. Those are great questions, and please keep asking. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
