[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marshall McMullen updated ZOOKEEPER-2311:
-----------------------------------------
    Description: 
We've started seeing an assert failing inside setup_random at line 537:

 528 static void setup_random()
 529 {
 530 #ifndef _WIN32          // TODO: better seed
 531     int seed;
 532     int fd = open("/dev/urandom", O_RDONLY);
 533     if (fd == -1) {
 534         seed = getpid();
 535     } else {
 536         int rc = read(fd, &seed, sizeof(seed));
 537         assert(rc == sizeof(seed));
 538         close(fd);
 539     }
 540     srandom(seed);
 541     srand48(seed);
 542 #endif

The core files show:

Program terminated with signal 6, Aborted.
#0  0x00007f9ff665a0d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007f9ff665a0d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f9ff665d83b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f9ff6652d9e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f9ff6652e42 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007f9ff8e4070a in setup_random () at src/zookeeper.c:476
#5  0x00007f9ff8e40d76 in resolve_hosts (zh=0x7f9fe14de400, 
hosts_in=0x7f9fd700f400 "10.26.200.6:2181,10.26.200.7:2181,10.26.200.8:2181", 
avec=0x7f9fd87fab60) at src/zookeeper.c:730
#6  0x00007f9ff8e40e87 in update_addrs (zh=0x7f9fe14de400) at 
src/zookeeper.c:801
#7  0x00007f9ff8e44176 in zookeeper_interest (zh=0x7f9fe14de400, 
fd=0x7f9fd87fac4c, interest=0x7f9fd87fac50, tv=0x7f9fd87fac80) at 
src/zookeeper.c:1980
#8  0x00007f9ff8e553f5 in do_io (v=0x7f9fe14de400) at src/mt_adaptor.c:379
#9  0x00000000020170ac in solidfire::ThreadBacktraces::LaunchThread 
(this=0x7f9ff0c8d500, args=<optimized out>) at shared/ThreadBacktraces.cpp:497
#10 0x00007f9ff804de9a in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f9ff671738d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

I'm not sure what the underlying cause of this is... But POSIX always allows 
for a short read(2), and any program MUST check for short reads... 

Has anyone else encountered this issue? We are seeing it rather frequently 
which is concerning.

  was:
We've started seeing an assert failing inside setup_random at line 537:

{monospaced}
 528 static void setup_random()
 529 {
 530 #ifndef _WIN32          // TODO: better seed
 531     int seed;
 532     int fd = open("/dev/urandom", O_RDONLY);
 533     if (fd == -1) {
 534         seed = getpid();
 535     } else {
 536         int rc = read(fd, &seed, sizeof(seed));
 537         assert(rc == sizeof(seed));
 538         close(fd);
 539     }
 540     srandom(seed);
 541     srand48(seed);
 542 #endif
{monospaced}

The core files show:

Program terminated with signal 6, Aborted.
#0  0x00007f9ff665a0d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007f9ff665a0d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f9ff665d83b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f9ff6652d9e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f9ff6652e42 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007f9ff8e4070a in setup_random () at src/zookeeper.c:476
#5  0x00007f9ff8e40d76 in resolve_hosts (zh=0x7f9fe14de400, 
hosts_in=0x7f9fd700f400 "10.26.200.6:2181,10.26.200.7:2181,10.26.200.8:2181", 
avec=0x7f9fd87fab60) at src/zookeeper.c:730
#6  0x00007f9ff8e40e87 in update_addrs (zh=0x7f9fe14de400) at 
src/zookeeper.c:801
#7  0x00007f9ff8e44176 in zookeeper_interest (zh=0x7f9fe14de400, 
fd=0x7f9fd87fac4c, interest=0x7f9fd87fac50, tv=0x7f9fd87fac80) at 
src/zookeeper.c:1980
#8  0x00007f9ff8e553f5 in do_io (v=0x7f9fe14de400) at src/mt_adaptor.c:379
#9  0x00000000020170ac in solidfire::ThreadBacktraces::LaunchThread 
(this=0x7f9ff0c8d500, args=<optimized out>) at shared/ThreadBacktraces.cpp:497
#10 0x00007f9ff804de9a in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f9ff671738d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

I'm not sure what the underlying cause of this is... But POSIX always allows 
for a short read(2), and any program MUST check for short reads... 

Has anyone else encountered this issue? We are seeing it rather frequently 
which is concerning.


> assert in setup_random
> ----------------------
>
>                 Key: ZOOKEEPER-2311
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2311
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Marshall McMullen
>
> We've started seeing an assert failing inside setup_random at line 537:
>  528 static void setup_random()
>  529 {
>  530 #ifndef _WIN32          // TODO: better seed
>  531     int seed;
>  532     int fd = open("/dev/urandom", O_RDONLY);
>  533     if (fd == -1) {
>  534         seed = getpid();
>  535     } else {
>  536         int rc = read(fd, &seed, sizeof(seed));
>  537         assert(rc == sizeof(seed));
>  538         close(fd);
>  539     }
>  540     srandom(seed);
>  541     srand48(seed);
>  542 #endif
> The core files show:
> Program terminated with signal 6, Aborted.
> #0  0x00007f9ff665a0d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #0  0x00007f9ff665a0d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f9ff665d83b in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007f9ff6652d9e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x00007f9ff6652e42 in __assert_fail () from 
> /lib/x86_64-linux-gnu/libc.so.6
> #4  0x00007f9ff8e4070a in setup_random () at src/zookeeper.c:476
> #5  0x00007f9ff8e40d76 in resolve_hosts (zh=0x7f9fe14de400, 
> hosts_in=0x7f9fd700f400 "10.26.200.6:2181,10.26.200.7:2181,10.26.200.8:2181", 
> avec=0x7f9fd87fab60) at src/zookeeper.c:730
> #6  0x00007f9ff8e40e87 in update_addrs (zh=0x7f9fe14de400) at 
> src/zookeeper.c:801
> #7  0x00007f9ff8e44176 in zookeeper_interest (zh=0x7f9fe14de400, 
> fd=0x7f9fd87fac4c, interest=0x7f9fd87fac50, tv=0x7f9fd87fac80) at 
> src/zookeeper.c:1980
> #8  0x00007f9ff8e553f5 in do_io (v=0x7f9fe14de400) at src/mt_adaptor.c:379
> #9  0x00000000020170ac in solidfire::ThreadBacktraces::LaunchThread 
> (this=0x7f9ff0c8d500, args=<optimized out>) at shared/ThreadBacktraces.cpp:497
> #10 0x00007f9ff804de9a in start_thread () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #11 0x00007f9ff671738d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #12 0x0000000000000000 in ?? ()
> I'm not sure what the underlying cause of this is... But POSIX always allows 
> for a short read(2), and any program MUST check for short reads... 
> Has anyone else encountered this issue? We are seeing it rather frequently 
> which is concerning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to