Re: structured data split

臧冬松 Fri, 11 Nov 2011 02:11:50 -0800

Thanks Denny!
So that means each map task will have to read from another DataNode inorder
to read the end line of the previous block?


Cheers,
Donal

2011/11/11 Denny Ye <denny...@gmail.com>

> hi
>    Structured data is always being split into different blocks, likes a
> word or line.
>    MapReduce task read HDFS data with the unit - *line* - it will read
> the whole line from the end of previous block to start of subsequent to
> obtains that part of line record. So you does not worry about the
> Incomplete structured data. HDFS do nothing for this mechanism.
>
> -Regards
> Denny Ye
>
>
> On Fri, Nov 11, 2011 at 3:43 PM, 臧冬松 <donal0...@gmail.com> wrote:
>
>> Usually large file in HDFS is split into bulks and store in different
>> DataNodes.
>> A map task is assigned to deal with that bulk, I wonder what if the
>> Structured data(i.e a word) was split into two bulks?
>> How MapReduce and HDFS deal with this?
>>
>> Thanks!
>> Donal
>>
>
>

Re: structured data split

Reply via email to