Re: What is the interpretation of Cores in Spark doc

Mich Talebzadeh Fri, 17 Jun 2016 08:10:46 -0700

great reply everyone.

just confining to the current subject matter Spark and the use of CPU
allocation. We have Spark-submit parameters:


Local mode

${SPARK_HOME}/bin/spark-submit \
                 --num-executors 1 \
                --master local[2] \  ## two cores


And that --master[k] on my box comes from

cat /proc/cpuinfo|grep processor
processor       : 0
processor       : 1
processor       : 2
processor       : 3
processor       : 4
processor       : 5
processor       : 6
processor       : 7
processor       : 8
processor       : 9
processor       : 10
processor       : 11

so there are 12 processors 0-12

And 12 core-id

cat /proc/cpuinfo|grep 'core id'
core id         : 0
core id         : 1
core id         : 2
core id         : 8
core id         : 9
core id         : 10
core id         : 0
core id         : 1
core id         : 2
core id         : 8
core id         : 9
core id         : 10

So in spark-submit I can put

${SPARK_HOME}/bin/spark-submit \
                 --num-executors 1 \
                --master local[12] \  ## Max cores

Actually this is what Spark doc
<http://spark.apache.org/docs/latest/submitting-applications.html>says

 *Run application locally on 8 cores*
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master local[8] \


That resolves our usage.

Now I mentioned earlier the licensing charges. So if I run any SAP product
they are going to charge us with cores on this host for their software

./cpuinfo
License hostid:        00e04c69159a 0050b60fd1e7
*Detected 12 logical processor(s), 6 core(s), in 1 chip(s)*

They charge by core(s) so we will have to pay for 6 cores not 12 logical
processors. I am sure if they knew that they could charge for 12 cores they
would have done it by now :)


Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 17 June 2016 at 12:01, Robin East <robin.e...@xense.co.uk> wrote:

> Agreed it’s a worthwhile discussion (and interesting IMO)
>
> This is a section from your original post:
>
> It is about the terminology or interpretation of that in Spark doc.
>>>>>
>>>>> This is my understanding of cores and threads.
>>>>>
>>>>>  Cores are physical cores. Threads are virtual cores.
>>>>>
>>>>
> At least as far as Spark doc is concerned Threads are not synonymous with
> virtual cores; they are closely related concepts of course. So any time we
> want to have a discussion about architecture, performance, tuning,
> configuration etc we do need to be clear about the concepts and how they
> are defined.
>
> Granted CPU hardware implementation can also refer to ’threads’. In fact
> Oracle/Sun seem unclear as to what they mean by thread - in various
> documents they define threads as:
>
> A software entity that can be executed on hardware (e.g. Oracle SPARC
> Architecture 2011)
>
> At other times as:
>
> A thread is a hardware strand. Each thread, or strand, enjoys a unique set
> of resources in support of its … (e.g. OpenSPARC T1 Microarchitecture
> Specification)
>
> So unless the documentation you are writing is very specific to your
> environment, and the idea that a thread is a logical processor is generally
> accepted, I would not be inclined to treat threads as if they are logical
> processors.
>
>
>
> On 16 Jun 2016, at 15:45, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Thanks all.
>
> I think we are diverging but IMO it is a worthwhile discussion
>
> Actually, threads are a hardware implementation - hence the whole notion
> of “multi-threaded cores”.   What happens is that the cores often have
> duplicate registers, etc. for holding execution state.   While it is
> correct that only a single process is executing at a time, a single core
> will have execution states of multiple processes preserved in these
> registers. In addition, it is the core (not the OS) that determines when
> the thread is executed. The approach often varies according to the CPU
> manufacturer, but the most simple approach is when one thread of execution
> executes a multi-cycle operation (e.g. a fetch from main memory, etc.), the
> core simply stops processing that thread saves the execution state to a set
> of registers, loads instructions from the other set of registers and goes
> on.  On the Oracle SPARC chips, it will actually check the next thread to
> see if the reason it was ‘parked’ has completed and if not, skip it for the
> subsequent thread. The OS is only aware of what are cores and what are
> logical processors - and dispatches accordingly.  *Execution is up to the
> cores*. .
>
> Cheers
>
>
>
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 16 June 2016 at 13:02, Robin East <robin.e...@xense.co.uk> wrote:
>
>> Mich
>>
>> >> A core may have one or more threads
>> It would be more accurate to say that a core could *run* one or more
>> threads scheduled for execution. Threads are a software/OS concept that
>> represent executable code that is scheduled to run by the OS; A CPU, core
>> or virtual core/virtual processor execute that code. Threads are not CPUs
>> or cores whether physical or logical - any Spark documentation that implies
>> this is mistaken. I’ve looked at the documentation you mention and I don’t
>> read it to mean that threads are logical processors.
>>
>> To go back to your original question, if you set local[6] and you have 12
>> logical processors then you are likely to have half your CPU resources
>> unused by Spark.
>>
>>
>> On 15 Jun 2016, at 23:08, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>> I think it is slightly more than that.
>>
>> These days  software is licensed by core (generally speaking).   That is
>> the physical processor.   * A core may have one or more threads - or
>> logical processors*. Virtualization adds some fun to the mix.
>> Generally what they present is ‘virtual processors’.   What that equates to
>> depends on the virtualization layer itself.   In some simpler VM’s - it is
>> virtual=logical.   In others, virtual=logical but they are constrained to
>> be from the same cores - e.g. if you get 6 virtual processors, it really is
>> 3 full cores with 2 threads each.   Rational is due to the way OS
>> dispatching works on ‘logical’ processors vs. cores and POSIX threaded
>> applications.
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 13 June 2016 at 18:17, Mark Hamstra <m...@clearstorydata.com> wrote:
>>
>>> I don't know what documentation you were referring to, but this is
>>> clearly an erroneous statement: "Threads are virtual cores."  At best it is
>>> terminology abuse by a hardware manufacturer.  Regardless, Spark can't get
>>> too concerned about how any particular hardware vendor wants to refer to
>>> the specific components of their CPU architecture.  For us, a core is a
>>> logical execution unit, something on which a thread of execution can run.
>>> That can map in different ways to different physical or virtual hardware.
>>>
>>> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> It is not the issue of testing anything. I was referring to
>>>> documentation that clearly use the term "threads". As I said and showed
>>>> before, one line is using the term "thread" and the next one "logical
>>>> cores".
>>>>
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 12 June 2016 at 23:57, Daniel Darabos <
>>>> daniel.dara...@lynxanalytics.com> wrote:
>>>>
>>>>> Spark is a software product. In software a "core" is something that a
>>>>> process can run on. So it's a "virtual core". (Do not call these 
>>>>> "threads".
>>>>> A "thread" is not something a process can run on.)
>>>>>
>>>>> local[*] uses java.lang.Runtime.availableProcessors()
>>>>> <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
>>>>> Since Java is software, this also returns the number of virtual cores. 
>>>>> (You
>>>>> can test this easily.)
>>>>>
>>>>>
>>>>> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I was writing some docs on Spark P&T and came across this.
>>>>>>
>>>>>> It is about the terminology or interpretation of that in Spark doc.
>>>>>>
>>>>>> This is my understanding of cores and threads.
>>>>>>
>>>>>>  Cores are physical cores. Threads are virtual cores. Cores with 2
>>>>>> threads is called hyper threading technology so 2 threads per core makes
>>>>>> the core work on two loads at same time. In other words, every thread 
>>>>>> takes
>>>>>> care of one load.
>>>>>>
>>>>>> Core has its own memory. So if you have a dual core with hyper
>>>>>> threading, the core works with 2 loads each at same time because of the 2
>>>>>> threads per core, but this 2 threads will share memory in that core.
>>>>>>
>>>>>> Some vendors as I am sure most of you aware charge licensing per core.
>>>>>>
>>>>>> For example on the same host that I have Spark, I have a SAP product
>>>>>> that checks the licensing and shuts the application down if the license
>>>>>> does not agree with the cores speced.
>>>>>>
>>>>>> This is what it says
>>>>>>
>>>>>> ./cpuinfo
>>>>>> License hostid:        00e04c69159a 0050b60fd1e7
>>>>>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>>>>>>
>>>>>> So here I have 12 logical processors  and 6 cores and 1 chip. I call
>>>>>> logical processors as threads so I have 12 threads?
>>>>>>
>>>>>> Now if I go and start worker process
>>>>>> ${SPARK_HOME}/sbin/start-slaves.sh, I see this in GUI page
>>>>>>
>>>>>> <image.png>
>>>>>>
>>>>>> it says 12 cores but I gather it is threads?
>>>>>>
>>>>>> Spark document
>>>>>> <http://spark.apache.org/docs/latest/submitting-applications.html>
>>>>>> states and I quote
>>>>>>
>>>>>> <image.png>
>>>>>>
>>>>>>
>>>>>> OK the line local[k] adds  ..  *set this to the number of cores on
>>>>>> your machine*
>>>>>>
>>>>>> But I know that it means threads. Because if I went and set that to
>>>>>> 6, it would be only 6 threads as opposed to 12 threads.
>>>>>>
>>>>>> the next line local[*] seems to indicate it correctly as it refers to
>>>>>> "logical cores" that in my understanding it is threads.
>>>>>>
>>>>>> I trust that I am not nitpicking here!
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>> LinkedIn * 
>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>
>

Re: What is the interpretation of Cores in Spark doc

Reply via email to