Bennie Schut
Tue, 17 Nov 2009 07:58:47 -0800
Yes it normal, once you're used to it it's not so bad. The same will happen when you write a custom(non-pig) mapreduce job. In pig you can use the "parallel 4;" syntax to specify the number of reducers and thus the number of output files.
zaki rahaman wrote: > Hi Matteo, > > This is completely normal. Someone else can correct me if I'm wrong, but > from my understanding, the number of part-000* files corresponds to the > number of reducers you end up having for your cluster. These can be empty > and others will indeed contain the data you want. What I usually do is run a > small script to collect the output data and format it appropriately... you > could just do something as simple as cat output/* ... and I'm not sure this > behavior is going to be changed anytime soon. > > On Tue, Nov 17, 2009 at 10:47 AM, Matteo Nasi <matteon...@gmail.com> wrote: > > >> hi all, >> >> I'm knew to Hadoop. Found Pig very quick and easy to learn, made my own >> simple scripts and my first own UDF. >> it's a loader UDF based on the piggybank samples found on 0.5.0 folders, it >> basically load data based on a fixed pattern, specified with regex code. >> everything is fine when run with -x local mode, I can manipulate input and >> generate output as well, output is only one part-00000 file containing data >> in the format I desired them. >> when I try to run it on my 5 nodes hadoop cluster with -x mapred option >> (0.20.1, same for 0.18.3 and old 0.4.0 pig) I have a strange behaviour for >> my output folder... >> there are more than one part-* file, some of them are empty ... some others >> contain the data I found before on the local run but splitted in different >> files ... >> >> first question is: why ? is it normal to have output splitted into >> different >> part-00000 in the cluster execution of a script ? >> >> if there's nothing to do with it I can always reassemble them into a unique >> file with a perl or bash script after the copytoLocal operation but it >> doesn't seem too nice for me :-( >> >> thanks in advance for any suggestion. >> >> Matteo >> >> > > > >