https://github.com/open-telemetry/opentelemetry-go-contrib/issues/6625

On Wednesday, January 15, 2025 at 11:02:37 PM UTC-8 John wrote:

> Thanks Kurtis for the advice.  I was heading in that direction.
>
> This is definitely an OTEL problem.  The minimal version required to 
> create the issue:
>
> metrics.go
> ```go
> package metrics
>
> import (
> _ "go.opentelemetry.io/contrib/instrumentation/host"
> )
> ```
>
> metrics_test.go
> ```go
> package metrics
> ```
>
> `go test -race`
>
> That will immediately cause the issue.  You don't even require tests, it 
> fails before it even gets there.
>
> I'll make my way over to the OTEL bugs tomorrow.  
>
> For those that are interested in some random debugger output, here is a 
> little from lldb and delve (which let's me see they are calling C from 
> purego):
>
> Process 58447 launched: '/Users/jdoak/base/concurrency/sync/sync.test' 
> (arm64)
> warning: (arm64) 
> /Users/jdoak/base/concurrency/sync/sync.test(0x0000000100000000) address 
> 0x0000000100000000 maps to more than one section: sync.test.__TEXT and 
> sync.test.__TEXT
> warning: (arm64) 
> /Users/jdoak/base/concurrency/sync/sync.test(0x0000000100000000) address 
> 0x0000000101bbc000 maps to more than one section: sync.test.__DATA_CONST 
> and sync.test.__DATA_CONST
> warning: (arm64) 
> /Users/jdoak/base/concurrency/sync/sync.test(0x0000000100000000) address 
> 0x0000000102b18000 maps to more than one section: sync.test.__DATA and 
> sync.test.__DATA
> Process 58447 stopped
> * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=1, address=0x10)
>     frame #0: 0x000000010000423c sync.test`__tsan_func_enter + 16
> sync.test`__tsan_func_enter:
> ->  0x10000423c <+16>: ldr    x8, [x0, #0x10]
>     0x100004240 <+20>: add    w9, w8, #0x8
>     0x100004244 <+24>: tst    x9, #0xff0
>     0x100004248 <+28>: b.eq   0x1000042a0    ; <+116>
> Target 0: (sync.test) stopped.
> (lldb) thread backtrace all
> * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=1, address=0x10)
>   * frame #0: 0x000000010000423c sync.test`__tsan_func_enter + 16
>     frame #1: 0x0000000101706e34 sync.test`
> github.com/ebitengine/purego/internal/fakecgo.x_cgo_notify_runtime_init_done 
> + 20
>     frame #2: 0x00000001017073f0 
> sync.test`x_cgo_notify_runtime_init_done_trampoline + 16
>   thread #2
>     frame #0: 0x00000001945e64e8 libsystem_kernel.dylib`__semwait_signal + 
> 8
>     frame #1: 0x00000001944c56f0 libsystem_c.dylib`nanosleep + 220
>     frame #2: 0x00000001944c5608 libsystem_c.dylib`usleep + 68
>     frame #3: 0x00000001000c6304 sync.test`runtime.usleep_trampoline.abi0 
> + 20
>   thread #3
>     frame #0: 0x00000001945e66ec libsystem_kernel.dylib`__psynch_cvwait + 8
>     frame #1: 0x0000000194624894 
> libsystem_pthread.dylib`_pthread_cond_wait + 1204
>     frame #2: 0x00000001000c6688 
> sync.test`runtime.pthread_cond_wait_trampoline.abi0 + 24
>     frame #3: 0x00000001000c4838 sync.test`runtime.asmcgocall.abi0 + 200
>   thread #4
>     frame #0: 0x00000001945e66ec libsystem_kernel.dylib`__psynch_cvwait + 8
>     frame #1: 0x0000000194624894 
> libsystem_pthread.dylib`_pthread_cond_wait + 1204
>     frame #2: 0x00000001000c6688 
> sync.test`runtime.pthread_cond_wait_trampoline.abi0 + 24
>     frame #3: 0x00000001000c4838 sync.test`runtime.asmcgocall.abi0 + 200
>   thread #5
>     frame #0: 0x00000001945e66ec libsystem_kernel.dylib`__psynch_cvwait + 8
>     frame #1: 0x0000000194624894 
> libsystem_pthread.dylib`_pthread_cond_wait + 1204
>     frame #2: 0x00000001000c6688 
> sync.test`runtime.pthread_cond_wait_trampoline.abi0 + 24
>     frame #3: 0x00000001000c4838 sync.test`runtime.asmcgocall.abi0 + 200
>   thread #6
>     frame #0: 0x00000001945e66ec libsystem_kernel.dylib`__psynch_cvwait + 8
>     frame #1: 0x0000000194624894 
> libsystem_pthread.dylib`_pthread_cond_wait + 1204
>     frame #2: 0x00000001000c6688 
> sync.test`runtime.pthread_cond_wait_trampoline.abi0 + 24
>     frame #3: 0x00000001000c4838 sync.test`runtime.asmcgocall.abi0 + 200
>     
>     
>    
> (dlv) continue
> > [runtime-fatal-throw] runtime.fatalsignal() 
> /usr/local/go/src/runtime/signal_unix.go:831 (hits goroutine(1):1 total:1) 
> (PC: 0x104f027bc)
> Warning: debugging optimized function
>    826:         printDebugLog()
>    827:
>    828:         exit(2)
>    829: }
>    830:
> => 831: func fatalsignal(sig uint32, c *sigctxt, gp *g, mp *m) *g {
>    832:         if sig < uint32(len(sigtable)) {
>    833:                 print(sigtable[sig].name, "\n")
>    834:         } else {
>    835:                 print("Signal ", sig, "\n")
>    836:         }
> (dlv) stack
> 0  0x0000000104f027bc in runtime.fatalsignal
>    at /usr/local/go/src/runtime/signal_unix.go:831
> 1  0x0000000104f02390 in runtime.sighandler
>    at /usr/local/go/src/runtime/signal_unix.go:754
> 2  0x0000000104f01cac in runtime.sigtrampgo
>    at /usr/local/go/src/runtime/signal_unix.go:490
> 3  0x0000000104e6c23c in ???
>    at ?:-1
> 4  0x0000000106569974 in 
> github.com/ebitengine/purego/internal/fakecgo.x_cgo_notify_runtime_init_done
>    at /Users/jdoak/go/pkg/mod/
> github.com/ebitengine/pur...@v0.8.1/internal/fakecgo/go_libinit.go:22 
> <http://github.com/ebitengine/purego@v0.8.1/internal/fakecgo/go_libinit.go:22>
> 5  0x000000016af95d88 in ???
>    at ?:-1
> 6  0x0000000104f2cadc in runtime.asmcgocall
>    at /usr/local/go/src/runtime/asm_arm64.s:1000
> 7  0x0000000104f2daa8 in racecall
>    at /usr/local/go/src/runtime/race_arm64.s:476
> 8  0x0000000000000000 in ???
>    at :0
>    error: NULL address
> (truncated)
>
> On Wednesday, January 15, 2025 at 9:41:47 PM UTC-8 Kurtis Rader wrote:
>
>> On Wed, Jan 15, 2025 at 8:31 PM John <johns...@gmail.com> wrote:
>>
>>> Hey Kurtis,
>>>
>>> Thanks for responding.
>>>
>>> Unfortunately, this does look like some type of OTEL problem.  I was 
>>> able to make a copy and strip out all the OTEL code.  As soon as I did 
>>> this, this stopped happening.  Which means it is some type of OTEL issue 
>>> that I should probably track down with the OTEL people.  
>>>
>>> As a note for someone who stumbles on this with a similar problem,  the 
>>> OTEL packages included:
>>>
>>> "go.opentelemetry.io/otel/attribute"
>>> "go.opentelemetry.io/otel/trace"
>>> "go.opentelemetry.io/otel/metric"
>>>
>>> These packages are at v1.33.0
>>>
>>
>> Note that simply removing the references to the above mentioned OTEL 
>> package does not guarantee the problem is with that package. The failure 
>> could still be due to how you are using the package. Having said that, any 
>> public package should validate its inputs and provide a more meaningful 
>> failure than a SIGSEGV fault. So even if the proximate cause of the failure 
>> is a mistake in your code there is clearly room for improvement in the 
>> package you are using.
>>
>> As a retired software support engineer who has spent thousands of hours 
>> debugging these types of problems I can't stress how important it is to 
>> create a minimal reproducible example as the quickest way to get to the 
>> root cause of the problem. A minimal reproducible example will allow 
>> others, such as the OTEL package maintainers, to employ tools, such as gdb 
>> or lldb, which you may not be comfortable using.
>>
>> -- 
>> Kurtis Rader
>> Caretaker of the exceptional canines Junior and Hank
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/d899fb29-2c7c-4983-9947-7e7fbfa65cb6n%40googlegroups.com.

Reply via email to