Thank you for the reply!
In each map(), I need to open-read-close these files (more than 2 in the 
general case, and maybe up to 20 or more), in order to make some checks. 
Considering the huge amount of data in the input, making all these file 
operations on HDFS will kill the performance!!! So I think it would be better 
to store these files in distributed Cache, so that the whole process would be 
more efficient -I guess this is the point of using Distributed Cache in the 
first place!

My question is, if I can store sequence files in distributed Cache and handle 
them using e.g. the SequenceFile.Reader class, or if I should only keep regular 
text files in distributed Cache and handle them using the usual java API.

Thank you very much
Sofia

PS: The files have small size, a few KB to few MB maximum.



________________________________
From: Dino Kečo <[email protected]>
To: [email protected]; Sofia Georgiakaki <[email protected]>
Sent: Friday, August 12, 2011 11:30 AM
Subject: Re: Hadoop--store a sequence file in distributed cache?

Hi Sofia,

I assume that output of first job is stored on HDFS. In that case I would
directly read file from Mappers without using distributed cache. If you put
file into distributed cache that would add one more copy operation into your
process.

Thanks,
dino


On Fri, Aug 12, 2011 at 9:53 AM, Sofia Georgiakaki
<[email protected]>wrote:

> Good morning,
>
> I would like to store some files in the distributed cache, in order to be
> opened and read from the mappers.
> The files are produced by an other Job and are sequence files.
> I am not sure if that format is proper for the distributed cache, as the
> files in distr.cache are stored and read locally. Should I change the format
> of the files in the previous Job and make them Text Files maybe and read
> them from the Distr.Cache using tha simple Java API?
> Or can I still handle them with the usual way we use sequence files, even
> if they reside in the local directory? Performance is extremely important
> for my project, so I don't know what the best solution would be.
>
> Thank you in advance,
> Sofia Georgiakaki

Reply via email to