[
https://issues.apache.org/jira/browse/CASSANDRA-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133671#comment-13133671
]
Brandon Williams commented on CASSANDRA-3045:
---------------------------------------------
This isn't as easy as it seems. Bulk loading this way requires becoming a fat
client. Since hadoop is colocated with cassandra, this means we would have to
divorce the "ip == node" marriage. This means rewriting most of how gossip
works, adding the port for the storage proto (and thus allowing port
divergence, an idea we have not been fond of in the past), modifying
MessagingService, Incoming/OutgoingTcpConnection, and probably other classes
that are notoriously hairy.
That is a lot of work, very difficult to make backwards-compatible, and we
really don't know what, if any, sort of gains we'll see using this method
afterwards. I'm personally very strongly -1 on making these changes to gossip
since I feel like it is finally fairly stable.
Even in a non-colocated setup, the task jvms would still need to respect
RING_DELAY, which might be enough to erode any gains that this could provide in
many scenarios.
One option might be to speak the storage proto directly to the local C*
instance, but add some kind of logic that says 'this is not a node nor a fat
client, just accept writes/reads from it and nothing else.'
> Update ColumnFamilyOutputFormat to use new bulkload API
> -------------------------------------------------------
>
> Key: CASSANDRA-3045
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3045
> Project: Cassandra
> Issue Type: Improvement
> Components: Hadoop
> Reporter: Jonathan Ellis
> Assignee: Brandon Williams
> Priority: Minor
> Fix For: 1.1
>
>
> The bulk loading interface added in CASSANDRA-1278 is a great fit for Hadoop
> jobs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira