Benjamin Reed
Mon, 24 Mar 2008 15:19:45 -0700
PigStorage uses regex for splitting as defined in: http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#sum It looks like you might need to specify PigStorage('[|]'). And yes, pig does process directories just like hadoop. ben On Monday 24 March 2008 15:07:39 Erik Paulson wrote: > Hello all - > > I'm trying to load data that is seperated by '|' characters, using the > PigStorage layer (using today's SVN) > > From following the code in Tuple, I think I'm doing this right, but maybe > something in the parser is eating my character seperators? > > > > grunt> cat /tmp/pipeseperated > first|second|third > grunt> cat /tmp/commaseperated > first,second,third > grunt> pipedata = load '/tmp/pipeseperated' using PigStorage('\\|'); > grunt> commadata = load '/tmp/commaseperated' using PigStorage(','); > grunt> dump pipedata > (, f, i, r, s, t, |, s, e, c, o, n, d, |, t, h, i, r, d, ) > grunt> dump commadata; > (first, second, third) > grunt> trytwo = load '/tmp/pipeseperated' using PigStorage('|'); > grunt> dump trytwo > (, f, i, r, s, t, |, s, e, c, o, n, d, |, t, h, i, r, d, ) > > > And a second question: in Hadoop, it's customary to give a path to a > directory containing all of the input files - is the same thing doable in > Pig? > > Thanks! > > -Erik