Re: CompositeInputFormat scalbility

pmg Wed, 24 Jun 2009 21:09:50 -0700

And what decides part-0000, part-0001....input split, block size? 

So for example for 1G of data on HDFS with 64m block size get 16 blocks
mapped to different map tasks?




jason hadoop wrote:
> 
> The join package does a streaming merge sort between each part-X in your
> input directories,
> part-0000 will be handled a single task,
> part-0001 will be handled in a single task
> and so on
> These jobs are essentially io bound, and hard to beat for performance.
> 
> On Wed, Jun 24, 2009 at 2:09 PM, pmg <parmod.me...@gmail.com> wrote:
> 
>>
>> I have two files FileA (with 600K records) and FileB (With 2million
>> records)
>>
>> FileA has a key which is same of all the records
>>
>> 123    724101722493
>> 123    781676672721
>>
>> FileB has the same key as FileA
>>
>> 123    5026328101569
>> 123    5026328001562
>>
>> Using hadoop join package I can create output file with tuples and cross
>> product of FileA and FileB.
>>
>> 123    [724101722493,5026328101569]
>> 123    [724101722493,5026328001562]
>> 123    [781676672721,5026328101569]
>> 123    [781676672721,5026328001562]
>>
>> How does CompositeInputFormat scale when we want to join 600K with 2
>> millions records. Does it run on the node with single map/reduce?
>>
>> Also how can I not write the result into a file instead input split the
>> result into different nodes where I can compare the tuples e.g. comparing
>> 724101722493 with 5026328101569 using some heuristics.
>>
>> thanks
>> --
>> View this message in context:
>> http://www.nabble.com/CompositeInputFormat-scalbility-tp24192957p24192957.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
> 
> 

-- 
View this message in context: 
http://www.nabble.com/CompositeInputFormat-scalbility-tp24192957p24196664.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: CompositeInputFormat scalbility

Reply via email to