Re: Using addCacheArchive

akhil1988 Thu, 25 Jun 2009 21:01:34 -0700

Thanks Amareshwari for your reply!

The file Config.zip is lying in the HDFS, if it would not have been then the
error would be reported by the jobtracker itself while executing the
statement:
DistributedCache.addCacheArchive(new URI("/home/akhil1988/Config.zip"),
conf);


But I get error in the map function when I try to access the Config
directory. 

Now I am using the following statement but still getting the same error: 
DistributedCache.addCacheArchive(new
URI("/home/akhil1988/Config.zip#Config"), conf);

Do you think whether there should be any problem in distributing a zipped
directory and then hadoop unzipping it recursively.

Thanks!
Akhil



Amareshwari Sriramadasu wrote:
> 
> Hi Akhil,
> 
> DistributedCache.addCacheArchive takes path on hdfs. From your code, it
> looks like you are passing local path.
> Also, if you want to create symlink, you should pass URI as
> hdfs://<path>#<linkname>, besides calling  
> DistributedCache.createSymlink(conf);
> 
> Thanks
> Amareshwari
> 
> 
> akhil1988 wrote:
>> Please ask any questions if I am not clear above about the problem I am
>> facing.
>>
>> Thanks,
>> Akhil
>>
>> akhil1988 wrote:
>>   
>>> Hi All!
>>>
>>> I want a directory to be present in the local working directory of the
>>> task for which I am using the following statements: 
>>>
>>> DistributedCache.addCacheArchive(new URI("/home/akhil1988/Config.zip"),
>>> conf);
>>> DistributedCache.createSymlink(conf);
>>>
>>>     
>>>>> Here Config is a directory which I have zipped and put at the given
>>>>> location in HDFS
>>>>>         
>>> I have zipped the directory because the API doc of DistributedCache
>>> (http://hadoop.apache.org/core/docs/r0.20.0/api/index.html) says that
>>> the
>>> archive files are unzipped in the local cache directory :
>>>
>>> DistributedCache can be used to distribute simple, read-only data/text
>>> files and/or more complex types such as archives, jars etc. Archives
>>> (zip,
>>> tar and tgz/tar.gz files) are un-archived at the slave nodes.
>>>
>>> So, from my understanding of the API docs I expect that the Config.zip
>>> file will be unzipped to Config directory and since I have SymLinked
>>> them
>>> I can access the directory in the following manner from my map function:
>>>
>>> FileInputStream fin = new FileInputStream("Config/file1.config");
>>>
>>> But I get the FileNotFoundException on the execution of this statement.
>>> Please let me know where I am going wrong.
>>>
>>> Thanks,
>>> Akhil
>>>
>>>     
>>
>>   
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Using-addCacheArchive-tp24207739p24214657.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Using addCacheArchive

Reply via email to