[
https://issues.apache.org/jira/browse/CASSANDRA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725654#action_12725654
]
Jonathan Ellis commented on CASSANDRA-265:
------------------------------------------
After some more thought I came up with a straightforward (if clunky) way to
support this in Thrift.
(In my defense I note that it's already de rigeur to wrap the thrift "client"
in something more idiomatic, and that [b]lob apis for traditional databases
bear more than a little resemblance.)
You would add these methods (actual names subject to bikeshedding):
begin_lob(key, columnPath, size, ts) returns thrift_lob_id
repeat until sum(byte.length) == size:
stream_lob(thrift_lob_id, byte[])
commit_lob(thrift_lob_id) throws Bad Stuff
These would map fairly directly to the StreamingMessage/LargeObjectCommand
structures described above.
Some opinions that I am not married to:
- we don't need block_for since streaming adds enough latency already that we
want to just assume block_for=N
- having a separate commit_lob is less magic than having the final stream_lob
behave a little differently from the others
> Large object support
> --------------------
>
> Key: CASSANDRA-265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-265
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Jonathan Ellis
>
> The standard answer since forever has been "cassandra is a bad fit for large
> objects."
> But I think it doesn't have to be that way. With a few simplifying
> assumptions we can make this doable.
> First, screw Thrift. There is no way to specify a stream of bytes
> cross-platform. You can't mix raw sockets into Thrift very easily (?) so
> screw it. Make it an internal-only API to start with, like the much-vaunted
> and much-feared BinaryVerbHandler.
> Second, forget about writing multiple lobs at once. You insert one lob at a
> time, to a specific column.
> With Thrift out of the equation we are not out of the woods.
> MessagingService also assumes that Messages will be memory resident and not
> streamed. One approach to fix this would be to have a StreamingMessage class
> that consists of a message id (that would be paired w/ origination endpoint
> to make it unique) and a size. The VerbHandler would keep a Map of
> incomplete StreamingMessages around until the full size was read. Then they
> could be disposed of.
> So a LargeObjectCommand would be basically just the command id and the
> payload, the streamed lob. And we would handle it by streaming it directly
> to a file. When the stream was complete, we would do a write to the standard
> commitlog/memtable with a pointer to that lob file. That would then be
> flushed normally to the sstable. (This would require adding another boolean
> to Column serialization, whether the value is really a lob pointer. We could
> combine this with the existing bool into a single byte and have room for a
> couple more flags, without taking extra space.)
> So lobs would never appear directly in the commitlog, and we would never have
> to rewrite them multiple times during compaction; just the pointers would get
> merged, but the lob files themselves would not have to be touched. (Except
> to remove them when a compaction shows that an older version is no longer
> needed.)
> Then of course we'd need a corresponding ReadLargeObject command. So the
> basics are straightforward.
> Read Repair and Hinted Handoff would add a few more wrinkles but nothing
> fundamentally challenging.
> Thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.