How many rows in average per partition? Let me get this straight : You are bifurcating your partitions on either email or username , essentially potentially doubling the data because you don’t have a way to manage a central system of record of users ?
I would do this: (my opinion) Migrate to a single sign on System that uses one or the other. Map and migrate your data to use a singular record as “identity”. I know that seems painful but I _hate_ perpetuating bad design because someone , in the past, presence , or future chooses to not solve the problem but get around it. This is not a storage optimization problem - it’s a data architecture problem. Rahul On Jul 28, 2018, 3:11 AM -0400, onmstester onmstester <onmstes...@zoho.com>, wrote: > The current data model described as table name: > ((partition_key),cluster_key),other_column1,other_column2,... > > user_by_name: ((time_bucket, username)),ts,request,email > user_by_mail: ((time_bucket, email)),ts,request,username > > The reason that all 2 keys (username, email) repeated in all tables is that > there may be different username with the same email or different email with > same username, and the query for data model is: > 1. username = X > 2. mail=Y > 3. username = X and mail= Y (we query one of tables and because there is > small number of records in result, we filter the other column) > > This data model results in wasting lots of storage. > I thought using UUID or hash code or sequence to handle this but i can't keep > track of the old vs new records (the ones that already have UUID). > Any recommendation on optimizing data model to save storage? > > Sent using Zoho Mail > >