I disagree. Create date as a raw integer is an excellent surrogate for controlling time series "buckets" as it gives you complete control over the granularity. You can even have multiple granularities in the same table - remember that partition key "misses" in Cassandra are pretty lightweight as they won't make it past the bloom filter on the read path.
On Wed, Apr 18, 2018 at 10:00 AM, Javier Pareja <pareja.jav...@gmail.com> wrote: > Hi David, > > Could you describe why you chose to include the create date in the > partition key? If the vin in enough "partitioning", meaning that the size > (number of rows x size of row) of each partition is less than 100MB, then > remove the date and just use the create_time, because the date is already > included in that column anyways. > > For example if columns "a" and "b" (from your table) are of max 256 UTF8 > characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows > per partition. You can actually have many more but you don't want to go > much higher for performance reasons. > > If this is not enough you could use create_month instead of create_date, > for example, to reduce the partition size while not being too granular. > > > On Tue, 17 Apr 2018, 22:17 Nate McCall, <n...@thelastpickle.com> wrote: > >> Your table design will work fine as you have appropriately bucketed by an >> integer-based 'create_date' field. >> >> Your goal for this refactor should be to remove the "IN" clause from your >> code. This will move the rollup of multiple partition keys being retrieved >> into the client instead of relying on the coordinator assembling the >> results. You have to do more work and add some complexity, but the trade >> off will be much higher performance as you are removing the single >> coordinator as the bottleneck. >> >> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni <xiangfei...@cm-dt.com> >> wrote: >> >>> Hi Nate, >>> >>> Thanks for your reply! >>> >>> Is there other way to design this table to meet this requirement? >>> >>> >>> >>> Best Regards, >>> >>> >>> >>> 倪项菲*/ **David Ni* >>> >>> 中移德电网络科技有限公司 >>> >>> Virtue Intelligent Network Ltd, co. >>> >>> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei >>> >>> Mob: +86 13797007811|Tel: + 86 27 5024 2516 >>> >>> >>> >>> *发件人:* Nate McCall <n...@thelastpickle.com> >>> *发送时间:* 2018年4月17日 7:12 >>> *收件人:* Cassandra Users <user@cassandra.apache.org> >>> *主题:* Re: Time serial column family design >>> >>> >>> >>> >>> >>> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date in >>> (20180416, 20180415, 20180414, 20180413, 20180412………………………………….); >>> >>> But this cause the cql query is very long,and I don’t know whether there >>> is limitation for the length of the cql. >>> >>> Please give me some advice,thanks in advance. >>> >>> >>> >>> Using the SELECT ... IN syntax means that: >>> >>> - the driver will not be able to route the queries to the nodes which >>> have the partition >>> >>> - a single coordinator must scatter-gather the query and results >>> >>> >>> >>> Break this up into a series of single statements using the executeAsync >>> method and gather the results via something like Futures in Guava or >>> similar. >>> >> >> >> >> -- >> ----------------- >> Nate McCall >> Wellington, NZ >> @zznate >> >> CTO >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> > -- ----------------- Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com