lmugnano4537 commented on issue #217: URL: https://github.com/apache/cloudberry-site/issues/217#issuecomment-2557636183
Here is a deck I share with my clients who are building a data warehouse (deck is generic and part of larger demo I do). It's showing generic code I built and provide for customers but it also shows my conceptual best practices. You can see I lean towards a Kimball based model (star or snowflake) because most of the time I'm pushing my customers towards a self-service BI model and feel dimensional modeling is the easiest for business users to understand. Physical design on this for me generally leads towards dimensions being AO columnar compressed and facts being AO row compressed and usually partitioned on one of the primary dates. I just started working with a new data modeling tool called DbSchema (https://dbschema.com/) and gave them code for Greenplum that they recently incorporated into the product so it's my tool of choice. One warning though, A LOT depends on the BI tool being used. You need to make sure the tool is pushing down the aggregation and only pulling back the columns the user is "slicing and dicing on" as shown by the Tableau examples in the deck (look at the queries from the DB side). Other tools don't generate good queries and want to just extract everything (horrible) in which case columnar wouldn't work well (honestly nothing works well with those tools, might as well just feed them flat files) [reporting_db_design_example.pdf](https://github.com/user-attachments/files/18214797/reporting_db_design_example.pdf) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
