Hi everyone, I am confronted with quite a difficult problem, how to connect 2 separate Hadoop instances, each having their own HDFS's, HBASE's etc.
My first though would be what if they share a Master node, surely this will allow any Mapreduce or Spark request to be run against both instances? Comments? If the above idea is feasible, how would one ensure that the data collection remain in each respective instance, and not be distributed among both instances? Kind Regards Dave