Large object support
--------------------
Key: CASSANDRA-265
URL: https://issues.apache.org/jira/browse/CASSANDRA-265
Project: Cassandra
Issue Type: New Feature
Reporter: Jonathan Ellis
The standard answer since forever has been "cassandra is a bad fit for large
objects."
But I think it doesn't have to be that way. With a few simplifying assumptions
we can make this doable.
First, screw Thrift. There is no way to specify a stream of bytes
cross-platform. You can't mix raw sockets into Thrift very easily (?) so screw
it. Make it an internal-only API to start with, like the much-vaunted and
much-feared BinaryVerbHandler.
Second, forget about writing multiple lobs at once. You insert one lob at a
time, to a specific column.
With Thrift out of the equation we are not out of the woods. MessagingService
also assumes that Messages will be memory resident and not streamed. One
approach to fix this would be to have a StreamingMessage class that consists of
a message id (that would be paired w/ origination endpoint to make it unique)
and a size. The VerbHandler would keep a Map of incomplete StreamingMessages
around until the full size was read. Then they could be disposed of.
So a LargeObjectCommand would be basically just the command id and the payload,
the streamed lob. And we would handle it by streaming it directly to a file.
When the stream was complete, we would do a write to the standard
commitlog/memtable with a pointer to that lob file. That would then be flushed
normally to the sstable. (This would require adding another boolean to Column
serialization, whether the value is really a lob pointer. We could combine
this with the existing bool into a single byte and have room for a couple more
flags, without taking extra space.)
So lobs would never appear directly in the commitlog, and we would never have
to rewrite them multiple times during compaction; just the pointers would get
merged, but the lob files themselves would not have to be touched. (Except to
remove them when a compaction shows that an older version is no longer needed.)
Then of course we'd need a corresponding ReadLargeObject command. So the
basics are straightforward.
Read Repair and Hinted Handoff would add a few more wrinkles but nothing
fundamentally challenging.
Thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.