It is a bit hard to follow. Perhaps you could include your proposed schema 
(annotated with your size predictions) to spur more discussion. To me, it 
sounds a bit convoluted. Why is a "batch" so big (up to 100 million rows)? Is a 
row in the primary only associated with one batch?


Sean Durity - Cassandra Admin, Big Data Team
To engage the team, create a 
request<https://portal.homedepot.com/sites/bigdata/SitePages/Big%20Data%20Engagement%20Request.aspx>

From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in]
Sent: Friday, July 24, 2015 3:57 AM
To: user@cassandra.apache.org
Subject: Re: Manual Indexing With Buckets

Can anyone take this one?

Thanks
Anuj

Sent from Yahoo Mail on 
Android<https://overview.mail.yahoo.com/mobile/?.src=Android>

________________________________
From:"Anuj Wadehra" <anujw_2...@yahoo.co.in<mailto:anujw_2...@yahoo.co.in>>
Date:Thu, 23 Jul, 2015 at 10:57 pm
Subject:Manual Indexing With Buckets
We have a primary table and we need search capability by batchid column. So we 
are creating a manual index for search by batch id. We are using buckets to 
restrict a row size in batch id index table to 50mb. As batch size may vary 
drastically ( ie one batch id may be associated to 100k row keys in primary 
table while other may be associated with 100million row keys), we are creating 
a metadata table to track the approximate data while insertions for a batch in 
primary table, so that batch id index table has dynamic no of buckets/rows. As 
more data is inserted for a batch in primary table, new set of 10 buckets are 
added. At any point in time, clients will write to latest 10 buckets created 
for a batch od index in round robin  to avoid hotspots.

Comments required on the following:
1. I want to know any suggestios on above design?

2. Whats the best approach for updating/deleting from index table. When a row 
is manually purged from primary table, we dont know where that row key exists 
in x number of buckets created for its batch id?

Thanks
Anuj

Sent from Yahoo Mail on 
Android<https://overview.mail.yahoo.com/mobile/?.src=Android>





________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Reply via email to