Hi Mich,
your assumptions 1 to 3 are all correct (nitpick: they're method
*calls*, the methods being the part before the parentheses, but I
assume that's what you meant). The last one is also a method call but
uses syntactic sugar on top: `foreach(println)` boils down to
`foreach(line => println(line))`.

On an unrelated side-note, I would suggest you add a period between
every method call, it makes things easier to read and is actually
required in certain circumstances. Specifically I would add a period
before collect() and foreach().

best,
--Jakob

On Wed, Feb 10, 2016 at 2:35 PM, Mich Talebzadeh
<mich.talebza...@cloudtechnologypartners.co.uk> wrote:
>
>
> Hi Chandeep
>
>
>
> Many thanks for your help
>
>
>
> In the line below
>
>
>
> errlog.filter(line => line.contains("sed"))collect()foreach(println)
>
>
>
> Can you please clarify the components with the correct naming as I am new to
> Scala
>
> errlog   --> is the RDD?
> filter(line => line.contains("sed")) is a method
> collect()  is another method ?
> foreach (println) ?
>
>
>
> Thanks
>
>
>
> On 10/02/2016 21:28, Chandeep Singh wrote:
>
> Hi Mich,
>
> If you would like to print everything to the console you could -
> errlog.filter(line => line.contains("sed"))collect()foreach(println)
>
> or you could always save to a file using any of the saveAs methods.
>
> Thanks,
> Chandeep
>
> On Wed, Feb 10, 2016 at 8:14 PM,
> <mich.talebza...@cloudtechnologypartners.co.uk> wrote:
>>
>>
>>
>> Hi,
>>
>> I have a bunch of files stored in hdfs /unit_files directory in total 319
>> files
>>
>> scala> val errlog = sc.textFile("/unix_files/*.ksh")
>>
>> scala>  errlog.filter(line => line.contains("sed"))count()
>> res104: Long = 1113
>>
>> So it returns 1113 instances the word "sed"
>>
>> If I want to see the collection I can do
>>
>>
>> scala>  errlog.filter(line => line.contains("sed"))collect()
>>
>> res105: Array[String] = Array("                         DSQUERY=${1} ;
>> DBNAME=${2} ; ERROR=0 ; PROGNAME=$(basename $0 | sed -e s/.ksh//)", #    .
>> in environment based on argument for script., "       exec sp_spaceused", "
>> exec sp_spaceused", PROGNAME=$(basename $0 | sed -e s/.ksh//), "
>> BACKUPSERVER=$5        # Server that is used to load the transaction dump",
>> "        BACKUPSERVER=$5         # Server that is used to load the
>> transaction dump", "        BACKUPSERVER=$5         # Server that is used to
>> load the transaction dump", "    cat $TMPDIR/${DBNAME}_trandump.sql | sed
>> s/${DSQUERY}/${REMOTESERVER}/ > $TMPDIR/${DBNAME}_trandump.tmpsql", cat
>> $TMPDIR/${DBNAME}_tran_transfer.sql | sed s/${DSQUERY}/${REMOTESERVER}/ >
>> $TMPDIR/${DBNAME}_tran_transfer.tmpsql, PROGNAME=$(basename $0 | sed -e
>> s/.ksh//), "        B...
>> scala>
>>
>>
>> Now is there anyway I can retrieve all these instances or perhaps they are
>> all wrapped up and I only see few lines?
>>
>> Thanks,
>>
>> Mich
>
>
>
>
>
> --
>
> Dr Mich Talebzadeh
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
> http://talebzadehmich.wordpress.com
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Cloud Technology
> Partners Ltd, its subsidiaries or their employees, unless expressly so
> stated. It is the responsibility of the recipient to ensure that this email
> is virus free, therefore neither Cloud Technology partners Ltd, its
> subsidiaries nor their employees accept any responsibility.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to