Radovan Zvoncek created CASSANDRA-8367:
------------------------------------------
Summary: Clash between Cassandra and Crunch mapreduce config
Key: CASSANDRA-8367
URL: https://issues.apache.org/jira/browse/CASSANDRA-8367
Project: Cassandra
Issue Type: Bug
Components: Hadoop
Reporter: Radovan Zvoncek
Priority: Minor
We would like to use Cassandra's (Cql)BulkOutputFormats to implement Resource
IOs for Crunch. We want to do this to allow Crunch users write results of their
jobs directly to Cassandra (thus skipping writing them to file system).
In the process of doing this, we found out there is a clash in the mapreduce
job config. The affected config key is 'mapreduce.output.basename'. Cassandra
is using it [1] for something different than Crunch [2]. This is resulting in
some obscure behavior I personally don't understand, but it causes the jobs to
fail.
We went ahead and re-implemented the output format classes to use different
config key, but we'd very much like to stop using them.
[1]
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/ConfigHelper.java#L54
[2]
https://github.com/apache/crunch/blob/3f13ee65c9debcf6bd7366607f58beae6c73ffe2/crunch-core/src/main/java/org/apache/crunch/io/CrunchOutputs.java#L99
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)