I am have partial success chipping away at the shared library dependencies of my hadoop job by submitting them to the distributed cache with the -files option. When I add another library to the -files list, it seems to work in that the run no longer fails on that library, but rather fails on another library instead, one I haven't added via -files yet, so I can envision completing this process, but...
I am just curious whether this is the correct way to run a job that depends on upwards of forty shared libraries. I don't really know which ones will be touched during a given run of course. All I know is that an 'ldd' dump on the binary (this is a C++ pipes job) suggests as many possible dependencies. Should I really copy forty .so files to my HDFS cluster and then reference them in an enormously long -files option when running the job...or am I not approaching this problem correctly; is there an alternate preferable method for doing this? Thanks. ________________________________________________________________________________ Keith Wiley [email protected] www.keithwiley.com "Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy." -- Edwin A. Abbott, Flatland ________________________________________________________________________________
