Rob,
I don't know PHP so can't advise you on the command-line flags, but I just
tried it with Perl, using both Pig 0.6 and Pig 0.8, and this works:

grunt> cats = load 'tmp/text.txt';
grunt> dump cats;
(Art)
(Arts/Animation)
(Arts/Animation/Anime)
(Arts/Animation/Anime/Characters)
(Arts/Animation/Anime/Clubs_and_Organizations)
(Arts/Animation/Anime/Collectibles)
(Arts/Animation/Anime/Collectibles/Cels)
(Arts/Animation/Anime/Collectibles/Models_and_Figures)
(Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures)
(Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam)
grunt> s = stream cats through `perl -np tmp/categorize.pl`;
grunt> dump s;
(Art)
(Arts/Animation)
(Arts/Animation/Anime)
(Arts/Animation/Anime/Characters)
(Arts/Animation/Anime/Clubs_and_Organizations)
(Arts/Animation/Anime/Collectibles)
(Arts/Animation/Anime/Collectibles/Cels)
(Arts/Animation/Anime/Collectibles/Models_and_Figures)
(Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures)
(Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam)


(my categorize.pl is empty, I am just using the -p flag to echo the input
back out)

-D


On Wed, Sep 29, 2010 at 5:15 AM, Rob Wilkerson <rwilker...@lotame.com>wrote:

> I have a Pig script--currently running in local mode--that processes a
> huge file containing a list of categories:
>
>    /root/level1/level2/level3
>    /root/level1/level2/level3/level4
>    ...
>
> I need to insert each of these into an existing database by calling a
> stored procedure. Because I'm new to Pig and the UDF interface is a
> little daunting, I'm trying to get something done by streaming the
> file's content through a PHP script.
>
> I'm finding that the PHP script only processes half of the category
> lines I'm passing through it, though. More precisely, I see a record
> returned for ceil( pig_categories/2 ). A limit of 15 will produce 8
> entries after streaming through the PHP script--the last one will be
> empty. Example output is shown below indicating the only the even
> records are getting processed.
>
> Here's a relevant snippet from my Pig script:
>
>    all_categories = LOAD 'categories.txt' USING PigStorage() AS
> (category:chararray);
>    ...Several layers of filtering...
>    ordered  = ORDER mappable_categories BY category;
>    limited  = LIMIT ordered 10;
>
>    categories = FOREACH limited GENERATE category;
>    DUMP categories; -- Displays all 20 categories
>
>    streamed = STREAM limited THROUGH `php -nF categorize.php`;
>    DUMP streamed; -- Displays 10 categories
>
> And the PHP script receiving the stream:
>
>    $category = fgets( STDIN );
>    echo $category;
>    # Yep, that's all there is right now
>
> Output:
>
>    -- From the `DUMP categories` line
>    (Arts)
>    (Arts/Animation)
>    (Arts/Animation/Anime)(Art s/Animation/Anime/Characters)
>    (Arts/Animation/Anime/Clubs_and_Organizations)
>    (Arts/Animation/Anime/Collectibles)
>    (Arts/Animation/Anime/Collectibles/Cels)
>    (Arts/Animation/Anime/Collectibles/Models_and_Figures)
>    (Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures)
>
>  (Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam)
>
>    -- From the `DUMP streamed` line
>    (Arts/Animation)
>    (Arts/Animation/Anime/Characters)
>    (Arts/Animation/Anime/Collectibles)
>    (Arts/Animation/Anime/Collectibles/Models_and_Figures)
>
>  (Arts/Animation/Anime/Collectibles/Models_and_Figures/Action_Figures/Gundam)
>
> As you can see, it looks like only the even lines are being handled by
> the PHP script.
>
> I haven't found any information about streaming through a PHP file, in
> fact, very little info about streaming through any file. This is
> particularly true for information about the content of the stream
> receiver file. I'm really hoping someone here can help me out because
> I'm kind of out of places to ask this question. Any guidance or
> insight would be much appreciated. It's kind of important that I
> process 100% of the records rather than half of them. :-)
>
> I posted a question about this on StackOverflow yesterday
> (http://stackoverflow.com/questions/3815673/pigs-stream-through-php),
> but it doesn't look like there's much Pig visibility on SO at this
> point. I'll update that question with any answer I get from this list.
>
> Thanks for your help.
>
> --
> +rw
>
> The information transmitted in this
> email is intended only for the
> person(s) or entity to which it is
> addressed and may contain
> confidential and/or privileged
> material. Any review,
> retransmission, dissemination
> or other use of, or taking of any
> action in reliance upon, this
> information by persons or entities
> other than the intended recipient
> is prohibited. If you received this
> email in error, please contact the
> sender and permanently delete the
> email from any computer.
>
>

Reply via email to