Keith, On Sat, May 22, 2010 at 5:01 AM, Keith Wiley <kwi...@keithwiley.com> wrote: > On May 21, 2010, at 16:07 , Mikhail Yakshin wrote: > >> On Fri, May 21, 2010 at 11:09 PM, Keith Wiley wrote: >>> My Java mapper hands its processing off to C++ through JNI. On the C++ >>> side I need to access a file. I have already implemented a version of this >>> interface in which the file is read entirely into RAM on the Java side and >>> is handed through JNI as a byte[] (received as a char[] of course). >>> However, it would simplify things if on the C++ side my code had access to >>> a conventional FILE* or file path, not a char[] in memory. The reason for >>> this is that I will be relying on an existing set of C and C++ code which >>> assumes it will be handed a filename (or perhaps a FILE*). Handing it a >>> char[] is not ideal for my use. >>> >>> ...so, can I take a file from HDFS and reference it via a conventional path >>> for fopen() or ifstream() usage? >>> >>> If I can't do this directly because HDFS is too unconventional (what with >>> the distributed blocks and all) can I at least do this from the distributed >>> cache perhaps? Could I load the file into the distributed cache on the >>> Java side and then tell the C/C++ side where it is in the distributed >>> cache? What would that look like? It would have to be a "path" or some >>> sort. I admit, I a bit vague on the details. >> >> Try using distributed cache: this way you'll get your HDFS file >> pre-distributed to local file system of all nodes that would be >> executing your job. This way you can get full local file name from >> using DistributedCache java object and open it normally using normal >> fopen(). > > > Ah, excellent. The only question that remains is how to get a local path to > a file in the distributed cache.
You can use DistributedCache.getLocalCacheFiles or JobContext#getLocalCacheFiles in newer versions. Also would libhdfs help in reading directly from DFS ? Thanks Hemanth