[
https://issues.apache.org/jira/browse/CASSANDRA-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260204#comment-14260204
]
Aleksey Yeschenko commented on CASSANDRA-8543:
----------------------------------------------
Use native protocol batching with prepared separate inserts - but make sure
that you only batch columns/rows with the same partition key.
Use DateTieredCompactionStrategy
(https://labs.spotify.com/2014/12/18/date-tiered-compaction/).
And, more importantly, don't try to optimize before you actually need it.
In any case, CASSANDRA-6412 is very unlikely to make it into Cassandra until
3.1 or 3.2, if at all, so any wins that you could get from your blob-packing
will be negated by the need to do a read before write.
You also lose convenient querying on lesser than 1024 limits, and the ability
to reuse 3.0 aggregate functions on your values. Also complicating MR/Spark
jobs and losing ability to use some of those pre-defined methods.
> Allow custom code to control behavior of reading and compaction
> ---------------------------------------------------------------
>
> Key: CASSANDRA-8543
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8543
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Pavol Slamka
> Priority: Minor
>
> When storing series data in blob objects because of speed improvements, it is
> sometimes neccessary to change only few values of a single blob (say few
> integers out of 1024 integers). Right now one could rewrite these using
> compare and set and versioning - read blob and version, change few values,
> write whole updated blob and incremented version if version did not change,
> repeat the whole process otherwise (optimistic approach). However compare and
> set brings some overhead. Let's try to leave out compare and set, and instead
> reading and updating, let's write only "blank" blob with only few values set.
> Blank blob contains special blank placeholder data such as NULL or max value
> of int or similar. Since this write in fact only appends new SStable record,
> we did not overwrite the old data yet. That happens during read or
> compaction. But if we provided custom read, and custom compaction, which
> would not replace the blob with a new "sparse blank" blob, but rather would
> replace values in first blob (first sstable record) with only "non blank"
> values from second blob (second sstable record), we would achieve fast
> partial blob update without compare and set on a last write wins basis. Is
> such approach feasible? Would it be possible to customize Cassandra so that
> custom code for compaction and data reading could be provided for a column
> (blob)?
> There may be other better solutions, but speedwise, this seems best to me.
> Sorry for any mistakes, I am new to Cassandra.
> Thanks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)