[ https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-1278: -------------------------------------- Component/s: Tools Fix Version/s: (was: 0.8) 0.7.1 Assignee: Matthew F. Dennis All a "bulk load" API needs to do is pretend it's a streaming source, and send data rows (in sorted order) to the target. Since Hadoop sorts as part of the reduce stage, we should be able do this directly in CFOF/CFRW. The tricky part is that StreamOutSession.begin assumes that it has a list of physical files to stream from (via addFilesToStream). > Make bulk loading into Cassandra less crappy, more pluggable > ------------------------------------------------------------ > > Key: CASSANDRA-1278 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1278 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Jeremy Hanna > Assignee: Matthew F. Dennis > Fix For: 0.7.1 > > > Currently bulk loading into Cassandra is a black art. People are either > directed to just do it responsibly with thrift or a higher level client, or > they have to explore the contrib/bmt example - > http://wiki.apache.org/cassandra/BinaryMemtable That contrib module requires > delving into the code to find out how it works and then applying it to the > given problem. Using either method, the user also needs to keep in mind that > overloading the cluster is possible - which will hopefully be addressed in > CASSANDRA-685 > This improvement would be to create a contrib module or set of documents > dealing with bulk loading. Perhaps it could include code in the Core to make > it more pluggable for external clients of different types. > It is just that this is something that many that are new to Cassandra need to > do - bulk load their data into Cassandra. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.