[
https://issues.apache.org/jira/browse/CASSANDRA-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056261#comment-14056261
]
T Jake Luciani edited comment on CASSANDRA-7520 at 7/9/14 2:06 PM:
-------------------------------------------------------------------
So you are arguing that with vnodes you would end up with similar partition
keys on the same physical node like "AAPL-2014-07-07-0001" and
"AAPL-2014-07-07-0002" and if you sorted by key you would get locality in the
sstable if this happened to be the case?
was (Author: tjake):
So you are arguing that with vnodes you would end up with similar partition
keys on the same physical node like "AAPL-2014-07-07-0001" and
"AAPL-2014-07-07-0002" and if you sorted by key you would get locality if this
happened to be the case?
> Permit sorting sstables by raw partition key, as opposed to token
> -----------------------------------------------------------------
>
> Key: CASSANDRA-7520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7520
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
>
> At the moment we have some counter-intuitive behaviour, which is that with a
> hashed partitioner (recommended) the more compacted the data is, the more
> randomly distributed it is amongst the file. This means that data access
> locality is made pretty much as bad as possible, and we rely on the OS to do
> its best to fix that for us with its page cache.
> [~jasobrown] mentioned this at the NGCC, but thinking on it some more it
> seems that many use cases may benefit from dropping the token at the storage
> level and sorting based on the raw key data. For workloads where nearness of
> key => likelihood of being coreferenced, this could improve data locality and
> cache hit rate dramatically. Timeseries workloads spring to mind, but I doubt
> this is constrained to them. Most likely any non-random access pattern could
> benefit. A random access pattern would most likely suffer from this scheme,
> as we can index more efficiently into the hashed data. However there's no
> reason we could not support both schemes.
--
This message was sent by Atlassian JIRA
(v6.2#6252)