Exactly!
As a final note, `foreach` is also defined on RDDs. This means that
you don't need to `collect()` the results into an array (which could
give you an OutOfMemoryError in case the RDD is really really large)
before printing them.

Personally, when I learn using a new library, I like to look at its
Scaladoc 
(http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
for Spark) and test it in the REPL/worksheets (for Spark you already
have `spark-shell`)

best,
--Jakob

On Wed, Feb 10, 2016 at 3:52 PM, Mich Talebzadeh
<mich.talebza...@cloudtechnologypartners.co.uk> wrote:
> Many thanks Jakob.
>
>
>
> So it basically boils down to this demarcation  as suggested which looks
> clearer
>
> val errlog = sc.textFile("/unix_files/*.ksh")
> errlog.filter(line => line.contains("sed")).collect().foreach(line =>
> println(line))
>
> Regards,
>
> Mich
>
> On 10/02/2016 23:21, Jakob Odersky wrote:
>
> Hi Mich,
> your assumptions 1 to 3 are all correct (nitpick: they're method
> *calls*, the methods being the part before the parentheses, but I
> assume that's what you meant). The last one is also a method call but
> uses syntactic sugar on top: `foreach(println)` boils down to
> `foreach(line => println(line))`.
>
> On an unrelated side-note, I would suggest you add a period between
> every method call, it makes things easier to read and is actually
> required in certain circumstances. Specifically I would add a period
> before collect() and foreach().
>
> best,
> --Jakob
>
> On Wed, Feb 10, 2016 at 2:35 PM, Mich Talebzadeh
> <mich.talebza...@cloudtechnologypartners.co.uk> wrote:
>
> Hi Chandeep Many thanks for your help In the line below errlog.filter(line
> => line.contains("sed"))collect()foreach(println) Can you please clarify the
> components with the correct naming as I am new to Scala errlog --> is the
> RDD? filter(line => line.contains("sed")) is a method collect() is another
> method ? foreach (println) ? Thanks On 10/02/2016 21:28, Chandeep Singh
> wrote: Hi Mich, If you would like to print everything to the console you
> could - errlog.filter(line => line.contains("sed"))collect()foreach(println)
> or you could always save to a file using any of the saveAs methods. Thanks,
> Chandeep On Wed, Feb 10, 2016 at 8:14 PM,
> <mich.talebza...@cloudtechnologypartners.co.uk> wrote:
>
> Hi, I have a bunch of files stored in hdfs /unit_files directory in total
> 319 files scala> val errlog = sc.textFile("/unix_files/*.ksh") scala>
> errlog.filter(line => line.contains("sed"))count() res104: Long = 1113 So it
> returns 1113 instances the word "sed" If I want to see the collection I can
> do scala> errlog.filter(line => line.contains("sed"))collect() res105:
> Array[String] = Array(" DSQUERY=${1} ; DBNAME=${2} ; ERROR=0 ;
> PROGNAME=$(basename $0 | sed -e s/.ksh//)", # . in environment based on
> argument for script., " exec sp_spaceused", " exec sp_spaceused",
> PROGNAME=$(basename $0 | sed -e s/.ksh//), " BACKUPSERVER=$5 # Server that
> is used to load the transaction dump", " BACKUPSERVER=$5 # Server that is
> used to load the transaction dump", " BACKUPSERVER=$5 # Server that is used
> to load the transaction dump", " cat $TMPDIR/${DBNAME}_trandump.sql | sed
> s/${DSQUERY}/${REMOTESERVER}/ > $TMPDIR/${DBNAME}_trandump.tmpsql", cat
> $TMPDIR/${DBNAME}_tran_transfer.sql | sed s/${DSQUERY}/${REMOTESERVER}/ >
> $TMPDIR/${DBNAME}_tran_transfer.tmpsql, PROGNAME=$(basename $0 | sed -e
> s/.ksh//), " B... scala> Now is there anyway I can retrieve all these
> instances or perhaps they are all wrapped up and I only see few lines?
> Thanks, Mich
>
> -- Dr Mich Talebzadeh LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> http://talebzadehmich.wordpress.com NOTE: The information in this email is
> proprietary and confidential. This message is for the designated recipient
> only, if you are not the intended recipient, you should destroy it
> immediately. Any information in this message shall not be understood as
> given or endorsed by Cloud Technology Partners Ltd, its subsidiaries or
> their employees, unless expressly so stated. It is the responsibility of the
> recipient to ensure that this email is virus free, therefore neither Cloud
> Technology partners Ltd, its subsidiaries nor their employees accept any
> responsibility.
>
>
>
>
>
> --
>
> Dr Mich Talebzadeh
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
> http://talebzadehmich.wordpress.com
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Cloud Technology
> Partners Ltd, its subsidiaries or their employees, unless expressly so
> stated. It is the responsibility of the recipient to ensure that this email
> is virus free, therefore neither Cloud Technology partners Ltd, its
> subsidiaries nor their employees accept any responsibility.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to