Keith,

On Sat, May 22, 2010 at 5:01 AM, Keith Wiley <kwi...@keithwiley.com> wrote:
> On May 21, 2010, at 16:07 , Mikhail Yakshin wrote:
>
>> On Fri, May 21, 2010 at 11:09 PM, Keith Wiley wrote:
>>> My Java mapper hands its processing off to C++ through JNI.  On the C++ 
>>> side I need to access a file.  I have already implemented a version of this 
>>> interface in which the file is read entirely into RAM on the Java side and 
>>> is handed through JNI as a byte[] (received as a char[] of course).  
>>> However, it would simplify things if on the C++ side my code had access to 
>>> a conventional FILE* or file path, not a char[] in memory.  The reason for 
>>> this is that I will be relying on an existing set of C and C++ code which 
>>> assumes it will be handed a filename (or perhaps a FILE*).  Handing it a 
>>> char[] is not ideal for my use.
>>>
>>> ...so, can I take a file from HDFS and reference it via a conventional path 
>>> for fopen() or ifstream() usage?
>>>
>>> If I can't do this directly because HDFS is too unconventional (what with 
>>> the distributed blocks and all) can I at least do this from the distributed 
>>> cache perhaps?  Could I load the file into the distributed cache on the 
>>> Java side and then tell the C/C++ side where it is in the distributed 
>>> cache?  What would that look like?  It would have to be a "path" or some 
>>> sort.  I admit, I a bit vague on the details.
>>
>> Try using distributed cache: this way you'll get your HDFS file
>> pre-distributed to local file system of all nodes that would be
>> executing your job. This way you can get full local file name from
>> using DistributedCache java object and open it normally using normal
>> fopen().
>
>
> Ah, excellent.  The only question that remains is how to get a local path to 
> a file in the distributed cache.

You can use DistributedCache.getLocalCacheFiles or
JobContext#getLocalCacheFiles in newer versions. Also would libhdfs
help in reading directly from DFS ?

Thanks
Hemanth

Reply via email to