I've been searching the docs but could find no help -- We have some machines that produce data - and on each we have an adapter (agent). Those machines are 'close' to each other - same network (physically). Then, we have the HDFS cluster on other machines, on another network. The two networks are of course connected (via internet). So, we want to know which is better - network-wise: to put the collector on the same network of the adapters, or on the same computer as the hdfs namenode? Option A - collector close to adapters - seems better to me because they send data ALL THE TIME to the collector, while the collector sends data to the hdfs only every 5 mins, with one writing action.
P.S - our collector writes exactly what he gets from the adapters, so there are no considerations regarding data volumes. Any recommendations? Thanks, -- Oded