I am have partial success chipping away at the shared library dependencies of 
my hadoop job by submitting them to the distributed cache with the -files 
option.  When I add another library to the -files list, it seems to work in 
that the run no longer fails on that library, but rather fails on another 
library instead, one I haven't added via -files yet, so I can envision 
completing this process, but...

I am just curious whether this is the correct way to run a job that depends on 
upwards of forty shared libraries.  I don't really know which ones will be 
touched during a given run of course.  All I know is that an 'ldd' dump on the 
binary (this is a C++ pipes job) suggests as many possible dependencies.

Should I really copy forty .so files to my HDFS cluster and then reference them 
in an enormously long -files option when running the job...or am I not 
approaching this problem correctly; is there an alternate preferable method for 
doing this?

Thanks.

________________________________________________________________________________
Keith Wiley               [email protected]               www.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
  -- Edwin A. Abbott, Flatland
________________________________________________________________________________



Reply via email to