Thanks for sharing, not that familiar with the DV approach but going to try
read up as sounds interesting.

I'd love to see an example using maybe sample Google analytics data. Where
its just a flat dump of denormalized click data and you need to then think
about the hubs and links maybe (not sure if what I'm saying makes sense
here).

I think there are public big query tables with raw GA data from their
sample account. Just thinking could be another nice type of dataset to have
examples for potentially.

Cheers
Andy

On Thu, 1 Mar 2018, 07:26 Gerard Toonstra, <[email protected]> wrote:

> Yesterday I finished the draft of a new example on the "ETL with airflow"
> site. This example explores the concept of a "Data vault"  methodology on
> top of Hive, 100% orchestrated by airflow:
>
> https://gtoonstra.github.io/etl-with-airflow/datavault2.html
>
> The theory of the data vault is that you can change the business rules of
> how data gets transformed, applied and calculated over time, which can be
> helpful, because you don't need prior agreements up-front when designing a
> DWH and have more flexibility to work out what's needed over time (i.e...
> you don't get pinned by design choices made months or years earlier). This
> means it reduces the need for consensus and meetings before you even get
> started with coding.
>
> As always, looking for input and suggestions on the example and code
> provided.
>
> Best regards,
>
> Gerard
>

Reply via email to