pig-user  

Re: Follow Up Questions: PigMix, DataGenerator etc...

Alan Gates
Mon, 09 Nov 2009 13:11:49 -0800


On Nov 8, 2009, at 7:08 AM, Rob Stewart wrote:

<snip>

So, Alan, you're correct, MapReduce, on its own does not provide me with loops, I have to wrap a loop around this MapReduce method "getAllChildren()" to get all children of john. When you say that I would have to wrap Java around Pig to simulate turing completeness, what exactly do you mean? Are there Pig Java classes that I can make use of to implement a Pig version of
"getAllChildren()"? Or do you mean to create a UDF ?

As Dmitry said, I wasn't thinking of a UDF as much as writing Java code that called PigServer.registerQuery and openIterator multiple times until you have found no new children.


Is there any comment to be made on the similarity between SQL and MapReduce as they share the common feature (lack thereof) of recursing down the above family tree in one pass to give me all responses (where the depth of the
tree is not known)?

Just that none of these three approaches (MapReduce, Pig Latin, and SQL) provide the necessary primitives to determine convergence. In all three cases you are forced to write the test and loop functionality outside of the main data processing. MR will never provide the primitives, because it is by definition a predefined operation controlled from the outside. SQL can do it in constructs like Oracle's PL/SQL. In a similar way Pig Latin could be extended to add loops and branches, but it is unclear at this point if that is what it should do. Adding these constructs to Pig Latin would take it from a data flow language to a data processing language. At least in the short term it is much simpler to depend on outside languages that already provide this functionality.

Alan.


Rob Stewart