Re: Data model storage optimization

Rahul Singh Sun, 29 Jul 2018 16:04:48 -0700

How many rows in average per partition?

Let me get this straight : You are bifurcating your partitions on either email 
or username , essentially potentially doubling the data because you don’t have 
a way to manage a central system of record of users ?


I would do this: (my opinion)
Migrate to a single sign on System that uses one or the other. Map and migrate 
your data to use a singular record as “identity”.

I know that seems painful but I _hate_ perpetuating bad design because someone 
, in the past, presence , or future chooses to not solve the problem but get 
around it.

This is not a storage optimization problem - it’s a data architecture problem.

Rahul
On Jul 28, 2018, 3:11 AM -0400, onmstester onmstester <onmstes...@zoho.com>, 
wrote:
> The current data model described as table name: 
> ((partition_key),cluster_key),other_column1,other_column2,...
>
> user_by_name: ((time_bucket, username)),ts,request,email
> user_by_mail: ((time_bucket, email)),ts,request,username
>
> The reason that all 2 keys (username, email) repeated in all tables is that 
> there may be different username with the same email or different email with 
> same username, and the query for data model is:
> 1.  username = X
> 2. mail=Y
> 3. username = X and mail= Y (we query one of tables and because there is 
> small number of records in result, we filter the other column)
>
> This data model results in wasting lots of storage.
> I thought using UUID or hash code or sequence to handle this but i can't keep 
> track of the old vs new records (the ones that already have UUID).
> Any recommendation on optimizing data model to save storage?
>
> Sent using Zoho Mail
>
>

Re: Data model storage optimization

Reply via email to