[
https://issues.apache.org/jira/browse/CASSANDRA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198793#comment-13198793
]
Brandon Williams commented on CASSANDRA-3740:
---------------------------------------------
bq. what is the significance of "INPUT_INITIAL_THRIFT_ADDRESS" for
BulkOutPutFormat.
For an output format, this won't be used, it's only for input formats.
bq. Is there any need to provide Listen address of the Hadoop Nodes for
BulkOutputFormat if yes How to provide the same?
I'm not sure what you mean, hadoop nodes themselves won't have a listen
address, and BOF will discover the cassandra nodes' listen address via thrift.
bq. Actually we are experiencing the problem while loading the data where it
fails to connect if the host the M/R job is running on is dualstack, i.e. has
both IPv4 and IPv6. Also it works when cassandra.yaml is provided ,may be it is
reading listen address or something from cassandra.yaml.
Hmm, I can't think of any reason that would work with the yaml, can you give
more details of the setup?
> While using BulkOutputFormat unneccessarily look for the cassandra.yaml file.
> ------------------------------------------------------------------------------
>
> Key: CASSANDRA-3740
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3740
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 1.1
> Reporter: Samarth Gahire
> Assignee: Brandon Williams
> Labels: cassandra, hadoop, mapreduce
> Fix For: 1.1
>
> Attachments: 0001-Make-DD-the-canonical-partitioner-source.txt,
> 0002-Prevent-loading-from-yaml.txt, 0003-use-output-partitioner.txt,
> 0004-update-BOF-for-new-dir-layout.txt
>
>
> I am trying to use BulkOutputFormat to stream the data from map of Hadoop
> job. I have set the cassandra related configuration using ConfigHelper ,Also
> have looked into Cassandra code seems Cassandra has taken care that it should
> not look for the cassandra.yaml file.
> But still when I run the job i get the following error:
> {
> 12/01/13 11:30:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing
> the arguments. Applications should implement Tool for the same.
> 12/01/13 11:30:04 INFO input.FileInputFormat: Total input paths to process : 1
> 12/01/13 11:30:04 INFO mapred.JobClient: Running job: job_201201130910_0015
> 12/01/13 11:30:05 INFO mapred.JobClient: map 0% reduce 0%
> 12/01/13 11:30:23 INFO mapred.JobClient: Task Id :
> attempt_201201130910_0015_m_000000_0, Status : FAILED
> java.lang.Throwable: Child Error
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> Caused by: java.io.IOException: Task process exit with nonzero status of 1.
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
> attempt_201201130910_0015_m_000000_0: Cannot locate cassandra.yaml
> attempt_201201130910_0015_m_000000_0: Fatal configuration error; unable to
> start server.
> }
> Also let me know how can i make this cassandra.yaml file available to Hadoop
> mapreduce job?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira