if the remote filesystem is visible from the other, than a different HDFS value, e.g hdfs://analytics:8000/historical/ can be used for reads & writes, even if your defaultFS (the one where you get max performance) is, say hdfs://processing:8000/
-performance will be slower, in both directions -if you have a fast pipe between the two clusters, then a job with many executors may unintentionally saturate the network, leading to unhappy people elsewhere. -you'd better have mutual trust at the kerberos layer. There's a configuration option (I forget its name) to give spark-submit a list of hdfs namenodes it will need to get tokens from. Unless your spark cluster is being launched with keytabs, you will need to list upfront all hdfs clusters your job intends to work with On 4 Dec 2016, at 21:45, ayan guha <guha.a...@gmail.com<mailto:guha.a...@gmail.com>> wrote: Hi Is it possible to access hive tables sitting on multiple clusters in a single spark application? We have a data processing cluster and analytics cluster. I want to join a table from analytics cluster with another table in processing cluster and finally write back in analytics cluster. Best Ayan