Nevermind :)  I found my answer in the docs for the PipedRDD

/**
 * An RDD that pipes the contents of each parent partition through an
external command
 * (printing them one per line) and returns the output as a collection of
strings.
 */
private[spark] class PipedRDD[T: ClassTag](

So, this is essentially an implementation of something analgous to hadoop's
streaming api.




On Sun, Jul 20, 2014 at 4:09 PM, jay vyas <jayunit100.apa...@gmail.com>
wrote:

> According to the api docs for the pipe operator,
> def  pipe(command: String): RDD
> <http://spark.apache.org/docs/1.0.0/api/scala/org/apache/spark/rdd/RDD.html>
> [String]: Return an RDD created by piping elements to a forked external
> process.
> However, its not clear to me:
>
> Will the outputted RDD capture the standard out from the process as its
> output (i assume that is the most common implementation)?
>
> Incidentally, I have not been able to use the pipe command to run an
> external process yet, so any hints on that would be appreciated.
>
> --
> jay vyas
>



-- 
jay vyas

Reply via email to