Rob, I don't know PHP so can't advise you on the command-line flags, but I just tried it with Perl, using both Pig 0.6 and Pig 0.8, and this works:
grunt> cats = load 'tmp/text.txt'; grunt> dump cats; (Art) (Arts/Animation) (Arts/Animation/Anime) (Arts/Animation/Anime/Characters) (Arts/Animation/Anime/Clubs_and_Organizations) (Arts/Animation/Anime/Collectibles) (Arts/Animation/Anime/Collectibles/Cels) (Arts/Animation/Anime/Collectibles/Models_and_Figures) (Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures) (Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam) grunt> s = stream cats through `perl -np tmp/categorize.pl`; grunt> dump s; (Art) (Arts/Animation) (Arts/Animation/Anime) (Arts/Animation/Anime/Characters) (Arts/Animation/Anime/Clubs_and_Organizations) (Arts/Animation/Anime/Collectibles) (Arts/Animation/Anime/Collectibles/Cels) (Arts/Animation/Anime/Collectibles/Models_and_Figures) (Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures) (Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam) (my categorize.pl is empty, I am just using the -p flag to echo the input back out) -D On Wed, Sep 29, 2010 at 5:15 AM, Rob Wilkerson <rwilker...@lotame.com>wrote: > I have a Pig script--currently running in local mode--that processes a > huge file containing a list of categories: > > /root/level1/level2/level3 > /root/level1/level2/level3/level4 > ... > > I need to insert each of these into an existing database by calling a > stored procedure. Because I'm new to Pig and the UDF interface is a > little daunting, I'm trying to get something done by streaming the > file's content through a PHP script. > > I'm finding that the PHP script only processes half of the category > lines I'm passing through it, though. More precisely, I see a record > returned for ceil( pig_categories/2 ). A limit of 15 will produce 8 > entries after streaming through the PHP script--the last one will be > empty. Example output is shown below indicating the only the even > records are getting processed. > > Here's a relevant snippet from my Pig script: > > all_categories = LOAD 'categories.txt' USING PigStorage() AS > (category:chararray); > ...Several layers of filtering... > ordered = ORDER mappable_categories BY category; > limited = LIMIT ordered 10; > > categories = FOREACH limited GENERATE category; > DUMP categories; -- Displays all 20 categories > > streamed = STREAM limited THROUGH `php -nF categorize.php`; > DUMP streamed; -- Displays 10 categories > > And the PHP script receiving the stream: > > $category = fgets( STDIN ); > echo $category; > # Yep, that's all there is right now > > Output: > > -- From the `DUMP categories` line > (Arts) > (Arts/Animation) > (Arts/Animation/Anime)(Art s/Animation/Anime/Characters) > (Arts/Animation/Anime/Clubs_and_Organizations) > (Arts/Animation/Anime/Collectibles) > (Arts/Animation/Anime/Collectibles/Cels) > (Arts/Animation/Anime/Collectibles/Models_and_Figures) > (Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures) > > (Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam) > > -- From the `DUMP streamed` line > (Arts/Animation) > (Arts/Animation/Anime/Characters) > (Arts/Animation/Anime/Collectibles) > (Arts/Animation/Anime/Collectibles/Models_and_Figures) > > (Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam) > > As you can see, it looks like only the even lines are being handled by > the PHP script. > > I haven't found any information about streaming through a PHP file, in > fact, very little info about streaming through any file. This is > particularly true for information about the content of the stream > receiver file. I'm really hoping someone here can help me out because > I'm kind of out of places to ask this question. Any guidance or > insight would be much appreciated. It's kind of important that I > process 100% of the records rather than half of them. :-) > > I posted a question about this on StackOverflow yesterday > (http://stackoverflow.com/questions/3815673/pigs-stream-through-php), > but it doesn't look like there's much Pig visibility on SO at this > point. I'll update that question with any answer I get from this list. > > Thanks for your help. > > -- > +rw > > The information transmitted in this > email is intended only for the > person(s) or entity to which it is > addressed and may contain > confidential and/or privileged > material. Any review, > retransmission, dissemination > or other use of, or taking of any > action in reliance upon, this > information by persons or entities > other than the intended recipient > is prohibited. If you received this > email in error, please contact the > sender and permanently delete the > email from any computer. > >