Note: I don't use the init function
在2021年2月2日星期二 UTC+8 下午3:48:26<颜文泽> 写道:

> If it works, it's fine, I'll just keep using vtune. I only work on x86 
> anyway. That said, I found another miracle, my program has 13 routines as 
> soon as it starts. It's so peculiar. I simply can't understand why this is.
>
> This is my code:
>
> [image: 2021-02-02 15-45-01 的屏幕截图.png]
> And then this is the result, it's amazing.I think I know why my program is 
> slow, the number of routines is too high, but I found that the GOMAXPROCS 
> function doesn't work, it's a really confusing phenomenon for me.
> My example did not do anything, my understanding of the number of runtines 
> should be 1 only Ah.
> [image: 2021-02-02 15-45-49 的屏幕截图.png]
> 在2021年2月2日星期二 UTC+8 下午3:27:45<Amnon> 写道:
>
>> Vtune is very useful for squeezing the ultimate performance out of Go 
>> programs, once you have done
>> the usual optimisation, mimized allocations, io etc. 
>>
>> pprof is more than adequate for the average programmer. But when you need 
>> to super-optimise 
>> functions which implement math kernels, crypto functions, video codecs 
>> etc, then without a HW perfomance
>> counter based profiler such as vtune or linux perf, (
>> https://perf.wiki.kernel.org/index.php/Main_Page)  you are shooting in 
>> the dark.
>> vtune not only tells you which functions are taking the most time, but 
>> WHY these are taking a long time,
>> how long the code is spending waiting for cache misses, and the different 
>> kind of stall cycles which 
>> kill performance on a modern CPU.
>>
>> Vtune or perf is also a great tool for teaching us about processors, and 
>> helping us understand what influences
>> the rate at which instructions are executed by them.
>>
>> The problem with vtune is that it is quite unfriendly and expensive (> 
>> $3000 for a single floating license)!
>> It also does not work on ARM processors (such as Apple M1).
>>
>> There has been a proposal to add performance counters to pprof.
>>
>> https://go.googlesource.com/proposal/+/refs/changes/08/219508/2/design/36821-perf-counter-pprof.md
>> If accepted, this would give the power of vtune to the masses for free..
>>
>> On Tuesday, 2 February 2021 at 06:37:37 UTC nnsm...@gmail.com wrote:
>>
>>> One more question, is it effective to use vtune to tune golang. I am 
>>> afraid that vtune is not suitable, although intel claims to be effective.
>>> 在2021年2月2日星期二 UTC+8 下午2:32:40<颜文泽> 写道:
>>>
>>>> Thanks, it's not memory db, but my current test is not involving io. 
>>>> I'll take time to look at your information, thanks a lot. Also I found 
>>>> that 
>>>> many of the functions with high cpi rate are runtime functions, is the 
>>>> overhead of these functions unavoidable?The following diagram is for a 
>>>> single routine:
>>>> [image: 2021-02-02 14-25-33 的屏幕截图.png]
>>>> The following chart is for the 8 routines:
>>>> [image: 2021-02-02 14-25-56 的屏幕截图.png]
>>>> 在2021年2月2日星期二 UTC+8 下午2:27:39<ren...@ix.netcom.com> 写道:
>>>>
>>>>> Unless it is an in memory database, I would expect the IO costs to 
>>>>> dwarf the cpu costs, but I guess a lot depends on how you define 
>>>>> ‘analytical processing’.
>>>>>
>>>>> In my experience, “out of the box” performance of Go routines in IO 
>>>>> processing is outstanding.
>>>>>
>>>>> For the cpu bound case, I think with threads, cpu assignments 
>>>>> (cpuset), etc. you can probably create a higher performing system in some 
>>>>> cases - but it’s a lot of work.
>>>>>
>>>>> Even without that, I think the scheduler in most Linux systems is more 
>>>>> mature than the Go scheduler, and makes better choices for cache 
>>>>> affinity, 
>>>>> etc. It’s very hard to design a high performance cpu bound system that 
>>>>> runs 
>>>>> on a general purpose OS or language/platform. Without knowledge of the 
>>>>> olap 
>>>>> db design it is very hard to make a recommendation.
>>>>>
>>>>> This is some suggested reading to help you in your journey 
>>>>> https://dave.cheney.net/high-performance-go-workshop/dotgo-paris.html
>>>>>
>>>>> On Feb 2, 2021, at 12:07 AM, 颜文泽 <nnsm...@gmail.com> wrote:
>>>>>
>>>>> I don't know much about the internal implementation of golang, sorry. 
>>>>> I was a c programmer and I tried to implement the original logic (olap 
>>>>> database) by using routine as a thread replacement. But I found that I 
>>>>> would encounter bottlenecks, and I don't know how to solve them. Maybe I 
>>>>> should study the implementation of routine before I can write the right 
>>>>> code.
>>>>>
>>>>> 在2021年2月2日星期二 UTC+8 下午12:21:44<ren...@ix.netcom.com> 写道:
>>>>>
>>>>>> You wrote “I found that cache misses from routines switching is also 
>>>>>> a headache”. 
>>>>>>
>>>>>> They would not be switching if they are cpu bound and there are less 
>>>>>> of than number of cpus. Remember too that you need some % of the cpus to 
>>>>>> execute the runtime GC code and other housekeeping. 
>>>>>>
>>>>>> > On Feb 1, 2021, at 10:04 PM, 颜文泽 <nnsm...@gmail.com> wrote: 
>>>>>> > 
>>>>>> > I found that cache misses from routines switching is also a 
>>>>>> headache 
>>>>>>
>>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "golang-nuts" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to golang-nuts...@googlegroups.com.
>>>>>
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/golang-nuts/35bccad0-64a9-4796-bc3f-a9cdb8c82961n%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/golang-nuts/35bccad0-64a9-4796-bc3f-a9cdb8c82961n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/94f53dc8-e904-43fc-90f6-ec3c103230f7n%40googlegroups.com.

Reply via email to