[
https://issues.apache.org/jira/browse/CASSANDRA-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882742#action_12882742
]
Karthick Sankarachary commented on CASSANDRA-1227:
--------------------------------------------------
Hi Bryan,
It totally makes sense to have the input and output configuration properties be
mutually disjoint. It was an oversight on my part to not separate out the
properties for the input and output formats to begin with.
To build on your suggestion, how about adding methods to the
ColumnFamilyInputFormat (ColumnFamilyOutputFormat) to allow users to set those
properties programmatically, just for the sake of convenience (a la
SequenceFileInputFormat (SequenceFileOutputFormat))? For example, to set the
column family for the input format, one could provide a
ColumnFamilyInputFormat.setColumnFamily(job, columnFamily) method, which would
simply translate that call to
job.getConfiguration().set(ConfigHelper.INPUT_COLUMNFAMILY_CONFIG,
columnFamily).
In addition, if you don't mind, can you add a test case for the use case
described above, if it doesn't involve too much configuration?
Regards,
Karthick
> Input and Output column families should be configured independently
> -------------------------------------------------------------------
>
> Key: CASSANDRA-1227
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1227
> Project: Cassandra
> Issue Type: Improvement
> Components: Hadoop
> Affects Versions: 0.7
> Reporter: Bryan Tower
> Fix For: 0.7
>
> Attachments: trunk-1227.txt
>
>
> I would like to use a ColumnFamilyInputFormat and a ColumnFamilyRecordReader
> to map a bunch of data from Cassandra to a job and then I would like to do
> some operations on the data and in the Reducer write out some summary of the
> work that I have done. Both the ColumnFamilyInputFormat and the
> ColumnFamilyOutputFormat read the column family from the same configuration
> property in the job configuration object (they both use the
> ConfigHelper.COLUMNFAMILY_CONFIG property). This means that I can not read
> from one Cassandra column family and write out to different one in the same
> job with the existing code.
> I changed the ColumnFamilyOutputFormat to read from
> "cassandra.output.columnfamily" instead of the "cassandra.input.columnfamily"
> that it was using before.
> I changed the COLUMNFAMILY_CONFIG property and related methods to include the
> word input. I also added corresponding Output versions of each of the
> relevant properties that should be configured for the
> ColumnFamilyOutputFormat.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.