Re: [I] Doc: new docs demands from slack [cloudberry-site]

via GitHub Wed, 18 Dec 2024 06:56:49 -0800


xinzweb commented on issue #217:
URL: 
https://github.com/apache/cloudberry-site/issues/217#issuecomment-2551540521


   Few ideas here based on experiences:
   
   * Use ORCA and encourage hash join and hash aggregate for OLAP queries
   * Use "star", don't use "snowflake" to reduce the number of joins, 
denormalize your data to change join into filters
   * Configuration, this deserve its own book
   * Partition your data, so that you don't delete, just drop the entire 
partition
   * Outer join of 10+ tables? well, answered in the first one, rely on ORCA, 
or answered in the second one, use star schema to reduce the number of joins
   * Unless you do singleton lookup, don't index. Views are only for management 
purpose, not for performance. Materialized views would be a great idea if you 
have a relative fixed workloads. For the partition, make your partition medium 
sized, and leverage the static and dynamic partition elimination.
   
   Again, those are very high level ideas to optimize for "append-only, 
truncate-only, mostly read-only" OLAP workloads. If you are thinking OLTP or 
any HTAP workloads, the above recommendation will not work.
   
   I agree with @tuhaihe that, we need to have extended documentation with 
general guidance as well as real world cases to demonstrate the tradeoff for 
the guidance above. Those are great questions, and please keep asking. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Doc: new docs demands from slack [cloudberry-site]

Reply via email to