[
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis updated CASSANDRA-1278:
--------------------------------------
Component/s: Tools
Fix Version/s: (was: 0.8)
0.7.1
Assignee: Matthew F. Dennis
All a "bulk load" API needs to do is pretend it's a streaming source, and send
data rows (in sorted order) to the target. Since Hadoop sorts as part of the
reduce stage, we should be able do this directly in CFOF/CFRW.
The tricky part is that StreamOutSession.begin assumes that it has a list of
physical files to stream from (via addFilesToStream).
> Make bulk loading into Cassandra less crappy, more pluggable
> ------------------------------------------------------------
>
> Key: CASSANDRA-1278
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
> Project: Cassandra
> Issue Type: Improvement
> Components: Tools
> Reporter: Jeremy Hanna
> Assignee: Matthew F. Dennis
> Fix For: 0.7.1
>
>
> Currently bulk loading into Cassandra is a black art. People are either
> directed to just do it responsibly with thrift or a higher level client, or
> they have to explore the contrib/bmt example -
> http://wiki.apache.org/cassandra/BinaryMemtable That contrib module requires
> delving into the code to find out how it works and then applying it to the
> given problem. Using either method, the user also needs to keep in mind that
> overloading the cluster is possible - which will hopefully be addressed in
> CASSANDRA-685
> This improvement would be to create a contrib module or set of documents
> dealing with bulk loading. Perhaps it could include code in the Core to make
> it more pluggable for external clients of different types.
> It is just that this is something that many that are new to Cassandra need to
> do - bulk load their data into Cassandra.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.