Hi all, I have a large file ( > 5 gigs) which I need to lookup. Since each slave need to perform the search operation on the hashmap (built out of the file) in parallel I need to broadcast the file. I was wondering if broadcasting such a huge file is really a good idea. Do we have any benchmarks for the broadcast variables. I am on a Standalone cluster and machine configuration is not a problem at the moment. Has anyone exploited broadcast to such an extent ?
Thanks, Purav