Florin,

Based on the SequenceFileInputFormat's splitting, you should see just
one task reading the record. SequenceFiles place sync markers (similar
to what 'newlines' mean in text files) after  a bunch of records, and
that is the reason why your record does not split when read.

Also worth thinking about increasing block size for these files to fit
their contents.

On Thu, Oct 27, 2011 at 9:31 PM, Florin P <florinp...@yahoo.com> wrote:
> Hello!
>  Suppose this scenario:
> 1. The DFS block 64MB
> 2. We populate a SequenceFile with a binary value that has 200MB (that 
> represents a PDF file)
> In the circumstances of above scenario:
> 1. How many blocks will be created on HDFS?
> 2. The number of blocks will be 200MB/64MB aprox 4 blocks?
> 3. How many task mappers will created? It is the same number as the number of 
> blocks?
> 4. If 4 mappers will be created, then one mapper will process the single 
> value of the file, and the other three are just created and stopped?
>
> I look forward for your answers.
> Thank you.
> Regards,
>  Florin
>
>



-- 
Harsh J

Reply via email to