[
https://issues.apache.org/jira/browse/CASSANDRA-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902273#action_12902273
]
Jonathan Ellis commented on CASSANDRA-1368:
-------------------------------------------
bq. Thrift and Avro serialization exist because JSON is not a nice way to deal
with tons of data (especially binary data).
We've introduced support for annotating data with types, so you can represent a
long as a long and a uuid as a pretty string, instead of everything being
opaque binary.
I worry that the Avro cure is worse than the disease.
bq. I think you are seriously underestimating the can of worms this would be,
and it wouldn't even get you timestamp support.
Maybe. But ColumnOrSuperColumn isn't a whole lot better, and has the drawback
of inflicting Yet Another Serialization Format on people to learn.
> Add output support for Hadoop Streaming
> ---------------------------------------
>
> Key: CASSANDRA-1368
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1368
> Project: Cassandra
> Issue Type: New Feature
> Components: Hadoop
> Reporter: Stu Hood
> Fix For: 0.7 beta 2
>
> Attachments: 0001-Switch-to-Cloudera-s-Distribution-of-Hadoop.patch,
> 0002-Add-an-Avro-OutputReader-and-Resolver-for-Hadoop-Str.patch,
> 0003-Apply-the-deprecated-OutputFormat-interface-to-allow.patch,
> 0004-Add-Streaming-example-shell-scripts.patch
>
>
> Hadoop Streaming is a framework that allows mapreduce jobs to be written in
> languages other than Java, by performing simple IPC on stdin/stdout.
> Adding output support for Hadoop Streaming to Cassandra would mean that users
> could write very simple scripts in dynamic languages to load data into
> Cassandra. Once our Hadoop OutputFormat has stabilized a bit, we might also
> be able to this code to provide scalable bulk loading.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.