Many thanks Jakob.
So it basically boils down to this demarcation as suggested which looks clearer val errlog = sc.textFile("/unix_files/*.ksh") errlog.filter(line => line.contains("sed")).collect().foreach(line => println(line)) Regards, Mich On 10/02/2016 23:21, Jakob Odersky wrote: > Hi Mich, > your assumptions 1 to 3 are all correct (nitpick: they're method > *calls*, the methods being the part before the parentheses, but I > assume that's what you meant). The last one is also a method call but > uses syntactic sugar on top: `foreach(println)` boils down to > `foreach(line => println(line))`. > > On an unrelated side-note, I would suggest you add a period between > every method call, it makes things easier to read and is actually > required in certain circumstances. Specifically I would add a period > before collect() and foreach(). > > best, > --Jakob > > On Wed, Feb 10, 2016 at 2:35 PM, Mich Talebzadeh > <mich.talebza...@cloudtechnologypartners.co.uk> wrote: > Hi Chandeep Many thanks for your help In the line below errlog.filter(line => > line.contains("sed"))collect()foreach(println) Can you please clarify the > components with the correct naming as I am new to Scala errlog --> is the > RDD? filter(line => line.contains("sed")) is a method collect() is another > method ? foreach (println) ? Thanks On 10/02/2016 21:28, Chandeep Singh > wrote: Hi Mich, If you would like to print everything to the console you > could - errlog.filter(line => line.contains("sed"))collect()foreach(println) > or you could always save to a file using any of the saveAs methods. Thanks, > Chandeep On Wed, Feb 10, 2016 at 8:14 PM, > <mich.talebza...@cloudtechnologypartners.co.uk> wrote: Hi, I have a bunch of > files stored in hdfs /unit_files directory in total 319 files scala> val > errlog = sc.textFile("/unix_files/*.ksh") scala> errlog.filter(line => > line.contains("sed"))count() res104: Long = 1113 So it returns 1113 instances > the word "sed" If I want to see the collection I can do scala> errlog.filter(line => line.contains("sed"))collect() res105: Array[String] = Array(" DSQUERY=${1} ; DBNAME=${2} ; ERROR=0 ; PROGNAME=$(basename $0 | sed -e s/.ksh//)", # . in environment based on argument for script., " exec sp_spaceused", " exec sp_spaceused", PROGNAME=$(basename $0 | sed -e s/.ksh//), " BACKUPSERVER=$5 # Server that is used to load the transaction dump", " BACKUPSERVER=$5 # Server that is used to load the transaction dump", " BACKUPSERVER=$5 # Server that is used to load the transaction dump", " cat $TMPDIR/${DBNAME}_trandump.sql | sed s/${DSQUERY}/${REMOTESERVER}/ > $TMPDIR/${DBNAME}_trandump.tmpsql", cat $TMPDIR/${DBNAME}_tran_transfer.sql | sed s/${DSQUERY}/${REMOTESERVER}/ > $TMPDIR/${DBNAME}_tran_transfer.tmpsql, PROGNAME=$(basename $0 | sed -e s/.ksh//), " B... scala> Now is there anyway I can retrieve all these instances or perhaps they are all wrapped up and I only see few lines? Thanks, Mich -- Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw [1] http://talebzadehmich.wordpress.com [2] NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Cloud Technology Partners Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Cloud Technology partners Ltd, its subsidiaries nor their employees accept any responsibility. -- Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Cloud Technology Partners Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Cloud Technology partners Ltd, its subsidiaries nor their employees accept any responsibility. Links: ------ [1] https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw [2] http://talebzadehmich.wordpress.com