pig-user  

Re: output file splitted

Bennie Schut
Tue, 17 Nov 2009 07:58:47 -0800

Yes it normal, once you're used to it it's not so bad. The same will
happen when you write a custom(non-pig) mapreduce job.
In pig you can use the "parallel 4;" syntax to specify the number of
reducers and thus the number of output files.

zaki rahaman wrote:
> Hi Matteo,
>
> This is completely normal. Someone else can correct me if I'm wrong, but
> from my understanding, the number of part-000* files corresponds to the
> number of reducers you end up having for your cluster. These can be empty
> and others will indeed contain the data you want. What I usually do is run a
> small script to collect the output data and format it appropriately... you
> could just do something as simple as cat output/* ... and I'm not sure this
> behavior is going to be changed anytime soon.
>
> On Tue, Nov 17, 2009 at 10:47 AM, Matteo Nasi <matteon...@gmail.com> wrote:
>
>   
>> hi all,
>>
>> I'm knew to Hadoop. Found Pig very quick and easy to learn, made my own
>> simple scripts and my first own UDF.
>> it's a loader UDF based on the piggybank samples found on 0.5.0 folders, it
>> basically load data based on a fixed pattern, specified with regex code.
>> everything is fine when run with -x local mode, I can manipulate input and
>> generate output as well, output is only one part-00000 file containing data
>> in the format I desired them.
>> when I try to run it on my 5 nodes hadoop cluster with -x mapred option
>> (0.20.1, same for 0.18.3 and old 0.4.0 pig) I have a strange behaviour for
>> my output folder...
>> there are more than one part-* file, some of them are empty ... some others
>> contain the data I found before on the local run but splitted in different
>> files ...
>>
>> first question is: why ? is it normal to have output splitted into
>> different
>> part-00000 in the cluster execution of a script ?
>>
>> if there's nothing to do with it I can always reassemble them into a unique
>> file with a perl or bash script after the copytoLocal operation but it
>> doesn't seem too nice for me :-(
>>
>> thanks in advance for any suggestion.
>>
>> Matteo
>>
>>     
>
>
>
>