[
https://issues.apache.org/jira/browse/CASSANDRA-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643479#comment-13643479
]
T Jake Luciani commented on CASSANDRA-5417:
-------------------------------------------
The best thing todo is just give you the uncompacted sstables...
https://docs.google.com/file/d/0B4FSNkh7LrJCc040UTRKZFdtTVk/edit?usp=sharing
You should keep the uncompacted sstables around and reset after each test
The two scenarios I tested were:
1. Time it takes to perform a major compaction (with and without patch)
2. Latency of reads for reading across all uncompacted tables (with and
without patch)
Here is the schema:
{code}
CREATE KEYSPACE mjff WITH replication = {'class': 'SimpleStrategy',
'replication_factor': 1};
use mjff;
CREATE TABLE data (
name text,
type text,
date timestamp,
value double,
PRIMARY KEY(name,type,date)
) WITH COMPACT STORAGE;
{code}
The reader code is simple:
{code}
public class StressReads {
static int threadCount = 2;
public static String[] names = new
String[]{"APPLE","VIOLET","SUNFLOWER","ROSE","PEONY","ORCHID","ORANGE","MAPLE","LILLY","FLOX","DAISY","DAFODIL","CROCUS","CHERRY"};
public static String[] types = new String[]{"diffSecs","N.samples",
"x.mean","x.absolue.deviation","x.standard.deviation",
"y.mean","y.absolue.deviation","y.standard.deviation",
"z.mean","z.absolue.deviation","z.standard.deviation"};
static ThreadLocal<Cassandra.Client> client = new
ThreadLocal<Cassandra.Client>() {
public Cassandra.Client initialValue() {
try{
TTransport trans = new TFramedTransport(new
TSocket("localhost",9160));
trans.open();
TProtocol prot = new TBinaryProtocol(trans);
Cassandra.Client client = new Cassandra.Client(prot);
client.set_keyspace("mjff");
return client;
}catch(Exception e){
throw new RuntimeException("err", e);
}
}
};
static ExecutorService threadPool =
Executors.newFixedThreadPool(threadCount);
static AtomicLong totalReads = new AtomicLong(0);
static long allReads = 0;
static int countSeconds = 0 ;
static Random rand = new Random();
public static void main(String[] args) throws InterruptedException {
for(int i=0; i<threadCount; i++) {
threadPool.submit(new Runnable() {
@Override
public void run() {
while(true){
StringBuffer sb = new StringBuffer();
sb.append("Select value from data where name='");
sb.append(names[rand.nextInt(names.length)]);
sb.append("' and type='");
sb.append(types[rand.nextInt(types.length)]);
sb.append("' and date > '2012-03-01 00:00:00' LIMIT
100");
try {
CqlResult result =
client.get().execute_cql3_query(ByteBufferUtil.bytes(sb.toString()),
Compression.NONE, ConsistencyLevel.ONE);
totalReads.addAndGet(result.getRows().size());
}catch(Exception e){
e.printStackTrace();
}
}
}
});
}
while (true) {
Thread.sleep(1000);
long reads = totalReads.getAndSet(0);
allReads += reads;
System.err.println("Read "+reads+" per/sec, avg
"+allReads/++countSeconds);
}
}
}
{code}
> Push composites support in the storage engine
> ---------------------------------------------
>
> Key: CASSANDRA-5417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5417
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Fix For: 2.0
>
>
> CompositeType happens to be very useful and is now widely used: CQL3 heavily
> rely on it, and super columns are now using it too internally. Besides,
> CompositeType has been advised as a replacement of super columns on the
> thrift side for a while, so it's safe to assume that it's generally used
> there too.
> CompositeType has initially been introduced as just another AbstractType.
> Meaning that the storage engine has no nothing whatsoever of composites
> being, well, composite. This has the following drawbacks:
> * Because internally a composite value is handled as just a ByteBuffer, we
> end up doing a lot of extra work. Typically, each time we compare 2 composite
> value, we end up "deserializing" the components (which, while it doesn't copy
> data per-se because we just slice the global ByteBuffer, still waste some cpu
> cycles and allocate a bunch of ByteBuffer objects). And since compare can be
> called *a lot*, this is likely not negligible.
> * This make CQL3 code uglier than necessary. Basically, CQL3 makes extensive
> use of composites, and since it gets backs ByteBuffer from the internal
> columns, it always have to check if it's actually a compositeType or not, and
> then split it and pick the different parts it needs. It's only an API
> problem, but having things exposed as composites directly would definitively
> make thinks cleaner. In particular, in most cases, CQL3 don't care whether it
> has a composite with only one component or a non-really-composite value, but
> we still always distinguishes both cases. Lastly, if we do expose composites
> more directly internally, it's not a lot more work to "internalize" better
> the different parts of the cell name that CQL3 uses (what's the clustering
> key, what's the actuall CQL3 column name, what's the collection element),
> making things cleaner. Last but not least, there is currently a bunch of
> places where methods take a ByteBuffer as argument and it's hard to know
> whether it expects a cell name or a CQL3 column name. This is pretty error
> prone.
> * It makes it hard (or impossible) to do a number of performance
> improvements. Consider CASSANDRA-4175, I'm not really sure how you can do it
> properly (in memory) if cell names are just ByteBuffer (since CQL3 column
> names are just one of the component in general). But we also miss
> oportunities of sharing prefixes. If we were able to share prefixes of
> composite names in memory we would 1) lower the memory footprint and 2)
> potentially speed-up comparison (of the prefixes) by checking reference
> equality first (also, doing prefix sharing on-disk, which is a separate
> concern btw, might be easier to do if we do prefix sharing in memory).
> So I suggest pushing CompositeType support inside the storage engine. What I
> mean by that concretely would be change the internal {{Column.name}} from
> ByteBuffer to some CellName type. A CellName would API-wise just be a list of
> ByteBuffer. But in practice, we'd have a specific CellName implementation for
> not-really-composite names, and the truly composite implementation will allow
> some prefix sharing. From an external API however, nothing would change, we
> would pack the composite as usual before sending it back to the client, but
> at least internally, comparison won't have to deserialize the components
> every time, and CQL3 code will be cleaner.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira