>> Are we sure JNI is a real problem? It really seems like the right tool >> for the job. Greg seems to remember them asking who would maintain the >> (non-java) JNI bits, but even if that's us and not them (which is probably >> the way to go anyway), I don't see that that's a problem. > > Yeh, it's sort of a wash. A nice goal would be to have a patch that allowed > Hadoop to not require any additional components (i.e. JNI packages) from the > Ceph repository. Given that the Ceph infrastructure will be installed anyway > in the case of Hadoop, it's a bit of a toss up.
The JNI isn't very _fun_ to develop, but it does do the job just fine and with the expected pattern of using a stable interface, with nothing extravagant needed for either Hadoop or Ceph. Hadoop already has JNI pieces, so adding more shouldn't be a problem (though I do wish the automake part wasn't so awkward to approach). I suppose there will need to be some automated check for Ceph as part of the ant build process. > > -n > >> Let's start with just providing the primary replica, at least until we >> find out whether hadoop takes advantage of additional ones (does HDFS read >> from the local non-primary replica?). > > I believe that Hadoop will schedule a map job on at a local replica for load > balancing, or to duplicate the work when a map is running slowly. Joe, can > you confirm this? > When I ran my basic evaluation, Hadoop was reporting its locality results as about 75% of jobs being run on the same node as the data. This seemed to be a result of overloading nodes. Someone will need to run a proper evaluation, as my experiment was small and blew up when I expanded my test cluster. It was probably a misconfigured kernel upgrade or something else uninteresting that's irrelevant here. --Alex-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
