[ 
https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032320#comment-17032320
 ] 

Pierre Smits edited comment on OFBIZ-10953 at 3/17/20, 8:47 AM:
----------------------------------------------------------------

The reasoning we applied to having a different primary key for the Uom table in 
the transaction db (an exception to the primary key definition for majority of 
the tables) applies also to the DWH.

For those not having access to the Data Warehouse Toolkit book (which is the 
starting point for the project's approach to Bi through the same named 
component) the first line states:

??Surrogate keys are used to implement the primary keys of *almost all* 
dimension tables.??

So, which dimension tables can the project consider to be the exceptions?
 These are the generic dimension tables that are uniform in meaning across 
businesses, e.g. those data sets defined by standardisation bodies. 
 Measurement dimension tables, like date, time, but also country, currency and 
all those other groups of records which are relating to records in the Uom 
table in the transaction db are such generic tables. These tables don't need a 
'self defined' primary keys added, because the natural key is self-explanatory.

That is why Kimball in his book suggests for the Date dimension table to use 
the ISO 8601 (specifically YYMMDD) for the date as the definition for the 
primary key. And the same is suggested for the Time dimension table (HHMM in 
that case).

The validation for these exceptions are:
 # the natural key is self-explanatory,
 # data abstraction negatively impacts query performance and legibility.

Compare following examples:
{code:java}
<SalesInvoiceItemFact billToCustomerDimId="_NA_" contributionAmount="2250.000" 
createdStamp="2019-05-29 10:05:01.0" createdTxStamp="2019-05-29 10:03:24.0" 
customerDimId="12001" extDiscountAmount="0.000" extDistributionCost="0.000" 
extGrossAmount="2250.000" extManFixedCost="0.000" extManVarCost="0.000" 
extNetAmount="2250.000" extStorageCost="0.000" extTaxAmount="0.000" 
*invoiceDateDimId="20151008"* invoiceId="95" invoiceItemSeqId="00001" 
lastUpdatedStamp="2019-05-29 10:05:01.0" lastUpdatedTxStamp="2019-05-29 
10:03:24.0" orderId="_NA_" organisationDimId="10010" *origCurrencyDimId="EUR"* 
productDimId="10087" quantity="1.000000" quantityUomDimId="10"/>
{code}
vs
{code:java}
<SalesInvoiceItemFact billToCustomerDimId="_NA_" contributionAmount="2250.000" 
createdStamp="2019-05-29 10:05:01.0" createdTxStamp="2019-05-29 10:03:24.0" 
customerDimId="12001_" extDiscountAmount="0.000" extDistributionCost="0.000" 
extGrossAmount="2250.000" extManFixedCost="0.000" extManVarCost="0.000" 
extNetAmount="2250.000" extStorageCost="0.000" extTaxAmount="0.000" 
*invoiceDateDimId="10000"* invoiceId="95" invoiceItemSeqId="00001" 
lastUpdatedStamp="2019-05-29 10:05:01.0" lastUpdatedTxStamp="2019-05-29 
10:03:24.0" orderId="_NA_" organisationDimId="10010" 
*origCurrencyDimId="10000"* productDimId="10087" quantity="1.000000" 
quantityUomDimId="LEN_m"/>
{code}
The latter means that in a production infrastructure the using company is 
penalised (performance and cost-wise) with an additional query (for the 
*origCurrencyDimId*) to the currency dimension to retrieve the underlying 
explanation/meaning (EUR). And similarly for the *quantityUomDimId* and other 
generically defined measurements (e.g. in the examples the  *invoiceDateDimId*).

This may not mean much in DHW implementations with a low number of records in 
fact tables. But when there a multitude of such fact tables with each having 
millions/billions of records in those tables using such keys of generic tables 
(these measurement tables), then it adds up significantly. 


was (Author: pfm.smits):
The reasoning we applied to having a different primary key for the Uom table in 
the transaction db (an exception to the primary key definition for majority of 
the tables) applies also to the DWH.

For those not having access to the Data Warehouse Toolkit book (which is the 
starting point for the project's approach to Bi through the same named 
component) the first line states:

??Surrogate keys are used to implement the primary keys of *almost all* 
dimension tables.??

So, which dimension tables can the project consider to be the exceptions?
 These are the generic dimension tables that are uniform in meaning across 
businesses, e.g. those data sets defined by standardisation bodies. 
 Measurement dimension tables, like date, time, but also country, currency and 
all those other groups of records which are relating to records in the Uom 
table in the transaction db are such generic tables. These tables don't need a 
'self defined' primary keys added, because the natural key is self-explanatory.

That is why Kimball in his book suggests for the Date dimension table to use 
the ISO 8601 (specifically YYMMDD) for the date as the definition for the 
primary key. And the same is suggested for the Time dimension table (HHMM in 
that case).

The validation for these exceptions are:
 # the natural key is self-explanatory,
 # data abstraction negatively impacts query performance and legibility.

Compare following examples:
{code:java}
<SalesInvoiceItemFact billToCustomerDimId="_NA_" contributionAmount="2250.000" 
createdStamp="2019-05-29 10:05:01.0" createdTxStamp="2019-05-29 10:03:24.0" 
customerDimId="12001" extDiscountAmount="0.000" extDistributionCost="0.000" 
extGrossAmount="2250.000" extManFixedCost="0.000" extManVarCost="0.000" 
extNetAmount="2250.000" extStorageCost="0.000" extTaxAmount="0.000" 
invoiceDateDimId="20151008" invoiceId="95" invoiceItemSeqId="00001" 
lastUpdatedStamp="2019-05-29 10:05:01.0" lastUpdatedTxStamp="2019-05-29 
10:03:24.0" orderId="_NA_" organisationDimId="10010" origCurrencyDimId="EUR" 
productDimId="10087" quantity="1.000000" quantityUomDimId="10"/>
{code}
vs
{code:java}
<SalesInvoiceItemFact billToCustomerDimId="_NA_" contributionAmount="2250.000" 
createdStamp="2019-05-29 10:05:01.0" createdTxStamp="2019-05-29 10:03:24.0" 
customerDimId="12001_" extDiscountAmount="0.000" extDistributionCost="0.000" 
extGrossAmount="2250.000" extManFixedCost="0.000" extManVarCost="0.000" 
extNetAmount="2250.000" extStorageCost="0.000" extTaxAmount="0.000" 
invoiceDateDimId="20151008" invoiceId="95" invoiceItemSeqId="00001" 
lastUpdatedStamp="2019-05-29 10:05:01.0" lastUpdatedTxStamp="2019-05-29 
10:03:24.0" orderId="_NA_" organisationDimId="10010" origCurrencyDimId="10000" 
productDimId="10087" quantity="1.000000" quantityUomDimId="LEN_m"/>
{code}
The latter means that in a production infrastructure the using company is 
penalised (performance and cost-wise) with an additional query (for the 
*origCurrencyDimId*) to the currency dimension to retrieve the underlying 
explanation/meaning (EUR). And similarly for the *quantityUomDimId* and other 
generically defined measurements (e.g. in the examples the  *invoiceDateDimId*).

This may not mean much in DHW implementations with a low number of records in 
fact tables. But when there a multitude of such fact tables with each having 
millions/billions of records in those tables using such keys of generic tables 
(these measurement tables), then it adds up significantly. 

> have CurrencyDimension have a dimensionId that is based on the natural key
> --------------------------------------------------------------------------
>
>                 Key: OFBIZ-10953
>                 URL: https://issues.apache.org/jira/browse/OFBIZ-10953
>             Project: OFBiz
>          Issue Type: Sub-task
>          Components: bi
>    Affects Versions: Release Branch 18.12, Release Branch 17.12, Trunk
>            Reporter: Pierre Smits
>            Assignee: Pierre Smits
>            Priority: Major
>              Labels: CurrencyDimension, birt, currency, dimension, dwh
>         Attachments: 20200208_094840.jpg
>
>
> Currently the record sequencer (delegator.getNextSeqId) is used to determine 
> the dimensionId for the CurrencyDimension. This is unnecessary as the uomId 
> from the UOM table can be used for currency.
> It also makes it easier to set the foreign-key in fact tables by generating 
> it based on the date provided, than by retrieving the dimensionId based on a 
> retrieval through the getDimensionIdFromNaturalKey service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to