Deepak On Sun, Apr 8, 2012 at 9:46 PM, Deepak Nettem <deepaknet...@gmail.com> wrote: > Hi, > > Is it possible to get the 'id' of the currently executing split or block > from within the mapper? Using this block Id / split id, I want to be able > to query the namenode to get the names of hosts having that block / spllit, > and the actual path to the data.
You can get the list of host locations for the current Mapper's split item via: https://gist.github.com/2339170 (or generally from a FileSystem object via https://gist.github.com/2339181) You can't get block IDs via any available publicly supported APIs. Therefore, you may consider getting the local block file path as an unavailable option too. > I need this for some analytics that I'm doing. Is there a client API that > allows doing this? If not, what's the best way to do this? There are some ways to go about it (I wouldn't consider this impossible to do for sure), but I'm curious what your 'analytics' is and how it correlates with needing block IDs and actual block file paths - Cause your problem may also be solvable by other, pre-available means. -- Harsh J