I have a Pig script--currently running in local mode--that processes a huge file containing a list of categories:
/root/level1/level2/level3 /root/level1/level2/level3/level4 ... I need to insert each of these into an existing database by calling a stored procedure. Because I'm new to Pig and the UDF interface is a little daunting, I'm trying to get something done by streaming the file's content through a PHP script. I'm finding that the PHP script only processes half of the category lines I'm passing through it, though. More precisely, I see a record returned for ceil( pig_categories/2 ). A limit of 15 will produce 8 entries after streaming through the PHP script--the last one will be empty. Example output is shown below indicating the only the even records are getting processed. Here's a relevant snippet from my Pig script: all_categories = LOAD 'categories.txt' USING PigStorage() AS (category:chararray); ...Several layers of filtering... ordered = ORDER mappable_categories BY category; limited = LIMIT ordered 10; categories = FOREACH limited GENERATE category; DUMP categories; -- Displays all 20 categories streamed = STREAM limited THROUGH `php -nF categorize.php`; DUMP streamed; -- Displays 10 categories And the PHP script receiving the stream: $category = fgets( STDIN ); echo $category; # Yep, that's all there is right now Output: -- From the `DUMP categories` line (Arts) (Arts/Animation) (Arts/Animation/Anime)(Art s/Animation/Anime/Characters) (Arts/Animation/Anime/Clubs_and_Organizations) (Arts/Animation/Anime/Collectibles) (Arts/Animation/Anime/Collectibles/Cels) (Arts/Animation/Anime/Collectibles/Models_and_Figures) (Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures) (Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam) -- From the `DUMP streamed` line (Arts/Animation) (Arts/Animation/Anime/Characters) (Arts/Animation/Anime/Collectibles) (Arts/Animation/Anime/Collectibles/Models_and_Figures) (Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam) As you can see, it looks like only the even lines are being handled by the PHP script. I haven't found any information about streaming through a PHP file, in fact, very little info about streaming through any file. This is particularly true for information about the content of the stream receiver file. I'm really hoping someone here can help me out because I'm kind of out of places to ask this question. Any guidance or insight would be much appreciated. It's kind of important that I process 100% of the records rather than half of them. :-) I posted a question about this on StackOverflow yesterday (http://stackoverflow.com/questions/3815673/pigs-stream-through-php), but it doesn't look like there's much Pig visibility on SO at this point. I'll update that question with any answer I get from this list. Thanks for your help. -- +rw The information transmitted in this email is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please contact the sender and permanently delete the email from any computer.