[rrd-developers] src/rrd_open.c: Check arguments in `rrd_dontneed'.
From: Florian Forster <[EMAIL PROTECTED]> Daniel Pocock reported that the argument may be NULL in low-diskspace situations, so check for that here to prevent a segmentation fault. --- src/rrd_open.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/src/rrd_open.c b/src/rrd_open.c index 2796506..f262413 100644 --- a/src/rrd_open.c +++ b/src/rrd_open.c @@ -364,6 +364,13 @@ void rrd_dontneed( unsigned long i; ssize_t _page_size = sysconf(_SC_PAGESIZE); +if (rrd_file == NULL) { +#if defined DEBUG && DEBUG + fprintf (stderr, "rrd_dontneed: Argument 'rrd_file' is NULL.\n"); +#endif + return; +} + #if defined DEBUG && DEBUG > 1 mincore_print(rrd_file, "before"); #endif -- 1.5.6.3 ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
[rrd-developers] crash - full hard disk
As part of the scalability tests I've been doing, I regularly fill up my hard disk with RRDs. I've noticed that rrdtool (trunk, linked with Ganglia 3.1) creates one or more files with size 0 or with other unusual sizes when the disk fills up, and shortly after, there is a seg fault (gdb output below) I wanted to create a ticket for this on the Trac system, but I couldn't find the link for creating an account, and the account published on the welcome page doesn't have permissions. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 46913030207808 (LWP 31151)] 0x003eac43052c in rrd_dontneed (rrd_file=0x0, rrd=0x2aaaca7fff10) at rrd_open.c:335 335 rra_start = rrd_file->header_len; (gdb) bt #0 0x003eac43052c in rrd_dontneed (rrd_file=0x0, rrd=0x2aaaca7fff10) at rrd_open.c:335 #1 0x003eac408afd in rrd_create_fn ( file_name=0x2aaaca800710 "/.../cpu_system.rrd", rrd=0x2aaaca800090) at rrd_create.c:807 #2 0x003eac407d3f in rrd_create_r ( filename=0x2aaaca800710 "/.../cpu_system.rrd", pdp_step=10, last_up=1223381383, argc=7, argv=0x2aaaca8002b0) at rrd_create.c:548 #3 0x003eac4065fb in rrd_create (argc=13, argv=0x2aaaca800280) at rrd_create.c:103 ___ This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing. Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered offic e at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. ___ ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
Re: [rrd-developers] src/rrd_open.c: Check arguments in `rrd_dontneed'.
Today Florian Forster wrote: > From: Florian Forster <[EMAIL PROTECTED]> > > Daniel Pocock reported that the argument may be NULL in low-diskspace > situations, so check for that here to prevent a segmentation fault. > --- Thanks, applied tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch [EMAIL PROTECTED] ++41 62 775 9902 / sb: -9900 ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
[rrd-developers] [PATCH] improved journal sanity checks, cleanup
This patch introduces some extra safety checks in journal processing, and cleans up the code a little bit. * moved journal initialization to its own function; main() is cleaner * any time we process a file, log the results (previous code only loggded if there was a valid entry) * After reading journals at startup, only trigger full flush out to disk if the user specified -F. Avoids unnecessary IO on startup unless the user also wants unnecessary IO on shutdown. * journal_replay is much more careful about files it will open * must be a regular file * must be owned by daemon user * must not be group/other writable * Ensure that the journal gets created with the right permissions. ... even when the daemon is invoked with a permissive umask. equivalent to "chmod a-x,go-w" --- diff --git a/src/rrd_daemon.c b/src/rrd_daemon.c index e2726e3..ea607d8 100644 --- a/src/rrd_daemon.c +++ b/src/rrd_daemon.c @@ -170,6 +170,7 @@ typedef enum queue_side_e queue_side_t; * Variables */ static int stay_foreground = 0; +static uid_t daemon_uid; static listen_socket_t *listen_fds = NULL; static size_t listen_fds_num = 0; @@ -1446,6 +1447,7 @@ static int handle_request (listen_socket_t *sock, /* {{{ */ static void journal_rotate(void) /* {{{ */ { FILE *old_fh = NULL; + int new_fd; if (journal_cur == NULL || journal_old == NULL) return; @@ -1460,11 +1462,20 @@ static void journal_rotate(void) /* {{{ */ if (journal_fh != NULL) { old_fh = journal_fh; +journal_fh = NULL; rename(journal_cur, journal_old); ++stats_journal_rotate; } - journal_fh = fopen(journal_cur, "a"); + new_fd = open(journal_cur, O_WRONLY|O_CREAT|O_APPEND, +S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH); + if (new_fd >= 0) + { +journal_fh = fdopen(new_fd, "a"); +if (journal_fh == NULL) + close(new_fd); + } + pthread_mutex_unlock(&journal_lock); if (old_fh != NULL) @@ -1542,6 +1553,44 @@ static int journal_replay (const char *file) /* {{{ */ if (file == NULL) return 0; + { +char *reason; +int status = 0; +struct stat statbuf; + +memset(&statbuf, 0, sizeof(statbuf)); +if (stat(file, &statbuf) != 0) +{ + if (errno == ENOENT) +return 0; + + reason = "stat error"; + status = errno; +} +else if (!S_ISREG(statbuf.st_mode)) +{ + reason = "not a regular file"; + status = EPERM; +} +if (statbuf.st_uid != daemon_uid) +{ + reason = "not owned by daemon user"; + status = EACCES; +} +if (statbuf.st_mode & (S_IWGRP|S_IWOTH)) +{ + reason = "must not be user/group writable"; + status = EACCES; +} + +if (status != 0) +{ + RRDD_LOG(LOG_ERR, "journal_replay: %s : %s (%s)", + file, rrd_strerror(status), reason); + return 0; +} + } + fh = fopen(file, "r"); if (fh == NULL) { @@ -1582,17 +1631,36 @@ static int journal_replay (const char *file) /* {{{ */ fclose(fh); - if (entry_cnt > 0) - { -RRDD_LOG(LOG_INFO, "Replayed %d entries (%d failures)", - entry_cnt, fail_cnt); -return 1; - } - else -return 0; + RRDD_LOG(LOG_INFO, "Replayed %d entries (%d failures)", + entry_cnt, fail_cnt); + return entry_cnt > 0 ? 1 : 0; } /* }}} static int journal_replay */ +static void journal_init(void) /* {{{ */ +{ + int had_journal = 0; + + if (journal_cur == NULL) return; + + pthread_mutex_lock(&journal_lock); + + RRDD_LOG(LOG_INFO, "checking for journal files"); + + had_journal += journal_replay(journal_old); + had_journal += journal_replay(journal_cur); + + /* it must have been a crash. start a flush */ + if (had_journal && config_flush_at_shutdown) +flush_old_values(-1); + + pthread_mutex_unlock(&journal_lock); + journal_rotate(); + + RRDD_LOG(LOG_INFO, "journal processing complete"); + +} /* }}} static void journal_init */ + static void close_connection(listen_socket_t *sock) { close(sock->fd) ; sock->fd = -1; @@ -2075,6 +2143,8 @@ static int daemonize (void) /* {{{ */ int fd; char *base_dir; + daemon_uid = geteuid(); + fd = open_pidfile(); if (fd < 0) return fd; @@ -2399,25 +2469,7 @@ int main (int argc, char **argv) return (1); } - if (journal_cur != NULL) - { -int had_journal = 0; - -pthread_mutex_lock(&journal_lock); - -RRDD_LOG(LOG_INFO, "checking for journal files"); - -had_journal += journal_replay(journal_old); -had_journal += journal_replay(journal_cur); - -if (had_journal) - flush_old_values(-1); - -pthread_mutex_unlock(&journal_lock); -journal_rotate(); - -RRDD_LOG(LOG_INFO, "journal processing complete"); - } + journal_init(); /* start the queue thread */ memset (&queue_thread, 0, sizeof (queue_thread)); ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/
Re: [rrd-developers] [PATCH] improved journal sanity checks, cleanup
Hi Kevin, great ... applied tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch [EMAIL PROTECTED] ++41 62 775 9902 / sb: -9900 ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
Re: [rrd-developers] crash - full hard disk
Hi Daniel: On Tue, Oct 7, 2008 at 5:18 AM, <[EMAIL PROTECTED]> wrote: > I wanted to create a ticket for this on the Trac system, but I couldn't > find the link for creating an account, and the account published on the > welcome page doesn't have permissions. http://oss.oetiker.ch/rrdtool-trac/ The username and password is in that page under "Editing". Cheers, Bernard ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
[rrd-developers] [PATCH] rrdcached: better permissions handling
This patch moves the permission handling code around a bit. * moved privilege checks into the command handler functions (possible now that we pass the sock data structures around) * on UPDATE, delay journal_write until after check_file_access(). previously, it was possible for a high-priv socket to introduce commands into the journal that could be replayed if they were still in the journal at next startup. * moved has_privilege() further up in the file to avoid need for prototype. --- diff --git a/src/rrd_daemon.c b/src/rrd_daemon.c index ea607d8..30cf748 100644 --- a/src/rrd_daemon.c +++ b/src/rrd_daemon.c @@ -943,6 +943,20 @@ err: return 0; } /* }}} static int check_file_access */ +/* returns 1 if we have the required privilege level, + * otherwise issue an error to the user on sock */ +static int has_privilege (listen_socket_t *sock, /* {{{ */ + socket_privilege priv) +{ + if (sock == NULL) /* journal replay */ +return 1; + + if (sock->privilege >= priv) +return 1; + + return send_response(sock, RESP_ERR, "%s\n", rrd_strerror(EACCES)); +} /* }}} static int has_privilege */ + static int flush_file (const char *filename) /* {{{ */ { cache_item_t *ci; @@ -1169,6 +1183,11 @@ static int handle_request_flush (listen_socket_t *sock, /* {{{ */ static int handle_request_flushall(listen_socket_t *sock) /* {{{ */ { + int status; + + status = has_privilege(sock, PRIV_HIGH); + if (status <= 0) +return status; RRDD_LOG(LOG_DEBUG, "Received FLUSHALL"); @@ -1185,12 +1204,20 @@ static int handle_request_update (listen_socket_t *sock, /* {{{ */ char *file; int values_num = 0; int status; + char orig_buf[CMD_MAX]; time_t now; cache_item_t *ci; now = time (NULL); + status = has_privilege(sock, PRIV_HIGH); + if (status <= 0) +return status; + + /* save it for the journal later */ + strncpy(orig_buf, buffer, sizeof(orig_buf)-1); + status = buffer_get_field (&buffer, &buffer_size, &file); if (status != 0) return send_response(sock, RESP_ERR, @@ -1258,6 +1285,10 @@ static int handle_request_update (listen_socket_t *sock, /* {{{ */ } /* }}} */ assert (ci != NULL); + /* don't re-write updates in replay mode */ + if (sock != NULL) +journal_write("update", orig_buf); + while (buffer_size > 0) { char **temp; @@ -1366,19 +1397,6 @@ static int batch_done (listen_socket_t *sock) /* {{{ */ return send_response(sock, RESP_OK, "errors\n"); } /* }}} static int batch_done */ -/* returns 1 if we have the required privilege level */ -static int has_privilege (listen_socket_t *sock, /* {{{ */ - socket_privilege priv) -{ - if (sock == NULL) /* journal replay */ -return 1; - - if (sock->privilege >= priv) -return 1; - - return send_response(sock, RESP_ERR, "%s\n", rrd_strerror(EACCES)); -} /* }}} static int has_privilege */ - /* if sock==NULL, we are in journal replay mode */ static int handle_request (listen_socket_t *sock, /* {{{ */ char *buffer, size_t buffer_size) @@ -1402,17 +1420,7 @@ static int handle_request (listen_socket_t *sock, /* {{{ */ sock->batch_cmd++; if (strcasecmp (command, "update") == 0) - { -status = has_privilege(sock, PRIV_HIGH); -if (status <= 0) - return status; - -/* don't re-write updates in replay mode */ -if (sock != NULL) - journal_write(command, buffer_ptr); - return (handle_request_update (sock, buffer_ptr, buffer_size)); - } else if (strcasecmp (command, "wrote") == 0 && sock == NULL) { /* this is only valid in replay mode */ @@ -1421,13 +1429,7 @@ static int handle_request (listen_socket_t *sock, /* {{{ */ else if (strcasecmp (command, "flush") == 0) return (handle_request_flush (sock, buffer_ptr, buffer_size)); else if (strcasecmp (command, "flushall") == 0) - { -status = has_privilege(sock, PRIV_HIGH); -if (status <= 0) - return status; - return (handle_request_flushall(sock)); - } else if (strcasecmp (command, "stats") == 0) return (handle_request_stats (sock)); else if (strcasecmp (command, "help") == 0) ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
[rrd-developers] [PATCH] rrdcached: "PENDING" and "FORGET" for cache management
This patch introduces two new commands for cache management: PENDING: shows any un-written updates for a file FORGET : remove a file completely from cache This applies cleanly on top of my previous patch ("better permissions handling"). --- diff --git a/doc/rrdcached.pod b/doc/rrdcached.pod index 4bd1bb8..b01165e 100644 --- a/doc/rrdcached.pod +++ b/doc/rrdcached.pod @@ -363,6 +363,15 @@ sent B the node has been dequeued. Causes the daemon to start flushing ALL pending values to disk. This returns immediately, even though the writes may take a long time. +=item B I + +Shows any "pending" updates for a file, in order. The updates shown have +not yet been written to the underlying RRD file. + +=item B I + +Removes I from the cache. Any pending updates B. + =item B [I] Returns a short usage message. If no command is given, or I is diff --git a/src/rrd_daemon.c b/src/rrd_daemon.c index 30cf748..9c8847d 100644 --- a/src/rrd_daemon.c +++ b/src/rrd_daemon.c @@ -540,6 +540,34 @@ static void remove_from_queue(cache_item_t *ci) /* {{{ */ ci->flags &= ~CI_FLAGS_IN_QUEUE; } /* }}} static void remove_from_queue */ +/* remove an entry from the tree and free all its resources. + * must hold 'cache lock' while calling this. + * returns 0 on success, otherwise errno */ +static int forget_file(const char *file) +{ + cache_item_t *ci; + + ci = g_tree_lookup(cache_tree, file); + if (ci == NULL) +return ENOENT; + + g_tree_remove (cache_tree, file); + remove_from_queue(ci); + + for (int i=0; i < ci->values_num; i++) +free(ci->values[i]); + + free (ci->values); + free (ci->file); + + /* in case anyone is waiting */ + pthread_cond_broadcast(&ci->flushed); + + free (ci); + + return 0; +} /* }}} static int forget_file */ + /* * enqueue_cache_item: * `cache_lock' must be acquired before calling this function! @@ -674,26 +702,10 @@ static int flush_old_values (int max_age) for (k = 0; k < cfd.keys_num; k++) { -cache_item_t *ci; - -/* This must not fail. */ -ci = (cache_item_t *) g_tree_lookup (cache_tree, cfd.keys[k]); -assert (ci != NULL); - -/* If we end up here with values available, something's seriously - * messed up. */ -assert (ci->values_num == 0); - -/* Remove the node from the tree */ -g_tree_remove (cache_tree, cfd.keys[k]); -cfd.keys[k] = NULL; - -/* Now free and clean up `ci'. */ -free (ci->file); -ci->file = NULL; -free (ci); -ci = NULL; - } /* for (k = 0; k < cfd.keys_num; k++) */ +/* should never fail, since we have held the cache_lock + * the entire time */ +assert( forget_file(cfd.keys[k]) == 0 ); + } if (cfd.keys != NULL) { @@ -977,6 +989,9 @@ static int flush_file (const char *filename) /* {{{ */ pthread_cond_wait(&ci->flushed, &cache_lock); } + /* DO NOT DO ANYTHING WITH ci HERE!! The entry + * may have been purged during our cond_wait() */ + pthread_mutex_unlock(&cache_lock); return (0); @@ -993,9 +1008,11 @@ static int handle_request_help (listen_socket_t *sock, /* {{{ */ { "Command overview\n" , +"HELP []\n" "FLUSH \n" "FLUSHALL\n" -"HELP []\n" +"PENDING \n" +"FORGET \n" "UPDATE [ ...]\n" "BATCH\n" "STATS\n" @@ -1020,6 +1037,26 @@ static int handle_request_help (listen_socket_t *sock, /* {{{ */ "Triggers writing of all pending updates. Returns immediately.\n" }; + char *help_pending[2] = + { +"Help for PENDING\n" +, +"Usage: PENDING \n" +"\n" +"Shows any 'pending' updates for a file, in order.\n" +"The updates shown have not yet been written to the underlying RRD file.\n" + }; + + char *help_forget[2] = + { +"Help for FORGET\n" +, +"Usage: FORGET \n" +"\n" +"Removes the file completely from the cache.\n" +"Any pending updates for the file will be lost.\n" + }; + char *help_update[2] = { "Help for UPDATE\n" @@ -1078,6 +1115,10 @@ static int handle_request_help (listen_socket_t *sock, /* {{{ */ help_text = help_flush; else if (strcasecmp (command, "flushall") == 0) help_text = help_flushall; +else if (strcasecmp (command, "pending") == 0) + help_text = help_pending; +else if (strcasecmp (command, "forget") == 0) + help_text = help_forget; else if (strcasecmp (command, "stats") == 0) help_text = help_stats; else if (strcasecmp (command, "batch") == 0) @@ -1198,6 +1239,73 @@ static int handle_request_flushall(listen_socket_t *sock) /* {{{ */ return send_response(sock, RESP_OK, "Started flush.\n"); } /* }}} static int handle_request_flushall */ +static int handle_request_pending(listen_socket_t *sock, /* {{{ */ + char *buffer, size_t buffer_size) +{ + int status; + char *file; + cache_item_t *ci; + + status = buffer_get_field(&buffer, &buffer_size, &file); + if (status != 0) +return send_response(sock, RESP_ERR, +
Re: [rrd-developers] [PATCH] rrdcached: "PENDING" and "FORGET" for cache management
Today kevin brintnall wrote: > This patch introduces two new commands for cache management: > > PENDING: shows any un-written updates for a file > FORGET : remove a file completely from cache > > This applies cleanly on top of my previous patch ("better permissions > handling"). thanks tobi > -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch [EMAIL PROTECTED] ++41 62 775 9902 / sb: -9900 ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
Re: [rrd-developers] [patch] rrdcached init script and spec file
Hi all: I have some comments regarding the rrdtool spec file that is in trunk now (which includes changes to incorporate rrdcached). First of all, thanks to Daniel for putting this together (saves me the work, heh). However, I have two comments: 1) I think we should break this out as a separate subpackage such as rrdtool-rrdcached as I don't think rrdcached is needed by your everyday installation (only large installations). Incorporating it in the main rrdtool package and especially by including an init script gives users the impression that this is something that is needed by everybody, which I don't think is the case. 2) By default, the rrdcached daemon is started once you install the RPM. While I think it is fine to add rrdcached as a service, I don't think it's a good idea to start up the daemon by default especially when one might want to make some configuration changes prior starting. It should be left to the user to start/top the daemon as they like. If you guys agree, I can go ahead and create a patch for the above two points. I may have additional comments after I've had some time to play with the new code. Thanks, Bernard ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
[rrd-developers] [PATCH] Update spec file to include librrd.pc file
Hi Tobi: This patch updates the spec file and includes the librrd.pc file in the -devel subpackage so that you can build the RPM again. Thanks, Bernard Index: rrdtool.spec === --- rrdtool.spec (revision 1588) +++ rrdtool.spec (working copy) @@ -312,6 +312,7 @@ %{_includedir}/*.h %exclude %{_libdir}/*.la %{_libdir}/*.so +%{_libdir}/pkgconfig/librrd.pc %files doc %defattr(-,root,root,-) @@ -357,6 +358,9 @@ %endif %changelog +* Tue Oct 07 2008 Bernard Li <[EMAIL PROTECTED]> +- Include librrd.pc file in -devel package + * Sun Jun 08 2008 Jarod Wilson <[EMAIL PROTECTED]> 1.3-0.20.rc9 - Update to rrdtool 1.3 rc9 - Minor spec tweaks to permit building on older EL ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
Re: [rrd-developers] [patch] rrdcached init script and spec file
Hi Daniel: The init script does not work on my system (CentOS 4.x) as is, because the `daemon` function which I have does not support --pidfile -- is that argument necessary? Also, as discussed previously, I think it would be a good idea to create a 'rrdcached' user and group and start the daemon as that user instead of nobody. For application-specific (eg. Ganglia) implementations, we can just put the necessary users (such as nobody, apache, ganglia) in the rrdcached group. Thanks, Bernard On Tue, Oct 7, 2008 at 3:58 PM, Bernard Li <[EMAIL PROTECTED]> wrote: > Hi all: > > I have some comments regarding the rrdtool spec file that is in trunk > now (which includes changes to incorporate rrdcached). > > First of all, thanks to Daniel for putting this together (saves me the > work, heh). > > However, I have two comments: > > 1) I think we should break this out as a separate subpackage such as > rrdtool-rrdcached as I don't think rrdcached is needed by your > everyday installation (only large installations). Incorporating it in > the main rrdtool package and especially by including an init script > gives users the impression that this is something that is needed by > everybody, which I don't think is the case. > > 2) By default, the rrdcached daemon is started once you install the > RPM. While I think it is fine to add rrdcached as a service, I don't > think it's a good idea to start up the daemon by default especially > when one might want to make some configuration changes prior starting. > It should be left to the user to start/top the daemon as they like. > > If you guys agree, I can go ahead and create a patch for the above two > points. I may have additional comments after I've had some time to > play with the new code. > > Thanks, > > Bernard > ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
[rrd-developers] [BUG] Typo with rrdcached manpage
With rrdtool r1588, the manpage for rrdcached's "ERROR REPORTING" reads: ---cut--- Once this has happened, the daemon will send log messages to the system logging daemon using syslog(3). The facility used it "LOG_DAEMON". ---cut--- There is probably a typo in the last sentence. Thanks, Bernard ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
[rrd-developers] rrdcached crashed with no logging
Hi all: I'm currently working with rrdcached from rrdtool r1588 and am having problems getting it to integrate with Ganglia. This has worked in the past (about 2 weeks ago). Right now I'm trying to figure out what's wrong. It seems that the daemon crashed without logging to syslog. I straced the rrdcached process and here's what I got: ---cut--- accept(3, {sa_family=AF_FILE, [EMAIL PROTECTED], [2]) = 6 mmap2(NULL, 10489856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb613d000 mprotect(0xb613d000, 4096, PROT_NONE) = 0 clone(child_stack=0xb6b3d4c4, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, parent_tidptr=0xb6b3dbe8, {entry_number:6, base_addr:0xb6b3dba0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb6b3dbe8) = 10556 poll([{fd=3, events=POLLIN|POLLPRI, revents=POLLIN}], 1, 1000) = 1 accept(3, {sa_family=AF_FILE, [EMAIL PROTECTED], [2]) = 7 mmap2(NULL, 10489856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb573c000 mprotect(0xb573c000, 4096, PROT_NONE) = 0 clone(child_stack=0xb613c4c4, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, parent_tidptr=0xb613cbe8, {entry_number:6, base_addr:0xb613cba0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb613cbe8) = 10560 poll([{fd=3, events=POLLIN|POLLPRI, revents=POLLIN}], 1, 1000) = 1 brk(0x906f000) = 0x906f000 futex(0x5ad820, FUTEX_WAKE, 1) = 1 accept(3, {sa_family=AF_FILE, [EMAIL PROTECTED], [2]) = 8 mmap2(NULL, 10489856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb4bff000 mprotect(0xb4bff000, 4096, PROT_NONE) = 0 clone(child_stack=0xb55ff4c4, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, parent_tidptr=0xb55ffbe8, {entry_number:6, base_addr:0xb55ffba0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb55ffbe8) = 10561 poll([{fd=3, events=POLLIN|POLLPRI, revents=POLLIN}], 1, 1000) = 1 accept(3, {sa_family=AF_FILE, [EMAIL PROTECTED], [2]) = 9 mmap2(NULL, 10489856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb41fe000 mprotect(0xb41fe000, 4096, PROT_NONE) = 0 clone(child_stack=0xb4bfe4c4, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, parent_tidptr=0xb4bfebe8, {entry_number:6, base_addr:0xb4bfeba0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb4bfebe8) = 10562 poll([{fd=3, events=POLLIN|POLLPRI}], 1, 1000) = -1 EINTR (Interrupted system call) +++ killed by SIGABRT +++ ---cut--- Ganglia was running `rrdtool graph - --daemon unix:/var/run/rrdcached/rrdcached.sock ...` command when it crashed. Thanks, Bernard ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
Re: [rrd-developers] rrdcached crashed with no logging
On Tue, Oct 07, 2008 at 06:01:02PM -0700, Bernard Li wrote: > It seems that the daemon crashed without logging to syslog. I straced > the rrdcached process and here's what I got: Bernard, Do you have a backtrace? Also, what OS are you using? The interrupted poll() system call is in listen_thread_main (you can tell by the timeout of 1sec). I would not expect to catch a SIGABRT. Possibly an assertion is being violated. A backtrace would be very helpful. -- kevin brintnall =~ /[EMAIL PROTECTED]/ > ---cut--- > accept(3, {sa_family=AF_FILE, [EMAIL PROTECTED], [2]) = 6 > mmap2(NULL, 10489856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, > -1, 0) = 0xb613d000 > mprotect(0xb613d000, 4096, PROT_NONE) = 0 > clone(child_stack=0xb6b3d4c4, > flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, > parent_tidptr=0xb6b3dbe8, {entry_number:6, base_addr:0xb6b3dba0, > limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, > limit_in_pages:1, seg_not_present:0, useable:1}, > child_tidptr=0xb6b3dbe8) = 10556 > poll([{fd=3, events=POLLIN|POLLPRI, revents=POLLIN}], 1, 1000) = 1 > accept(3, {sa_family=AF_FILE, [EMAIL PROTECTED], [2]) = 7 > mmap2(NULL, 10489856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, > -1, 0) = 0xb573c000 > mprotect(0xb573c000, 4096, PROT_NONE) = 0 > clone(child_stack=0xb613c4c4, > flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, > parent_tidptr=0xb613cbe8, {entry_number:6, base_addr:0xb613cba0, > limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, > limit_in_pages:1, seg_not_present:0, useable:1}, > child_tidptr=0xb613cbe8) = 10560 > poll([{fd=3, events=POLLIN|POLLPRI, revents=POLLIN}], 1, 1000) = 1 > brk(0x906f000) = 0x906f000 > futex(0x5ad820, FUTEX_WAKE, 1) = 1 > accept(3, {sa_family=AF_FILE, [EMAIL PROTECTED], [2]) = 8 > mmap2(NULL, 10489856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, > -1, 0) = 0xb4bff000 > mprotect(0xb4bff000, 4096, PROT_NONE) = 0 > clone(child_stack=0xb55ff4c4, > flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, > parent_tidptr=0xb55ffbe8, {entry_number:6, base_addr:0xb55ffba0, > limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, > limit_in_pages:1, seg_not_present:0, useable:1}, > child_tidptr=0xb55ffbe8) = 10561 > poll([{fd=3, events=POLLIN|POLLPRI, revents=POLLIN}], 1, 1000) = 1 > accept(3, {sa_family=AF_FILE, [EMAIL PROTECTED], [2]) = 9 > mmap2(NULL, 10489856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, > -1, 0) = 0xb41fe000 > mprotect(0xb41fe000, 4096, PROT_NONE) = 0 > clone(child_stack=0xb4bfe4c4, > flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, > parent_tidptr=0xb4bfebe8, {entry_number:6, base_addr:0xb4bfeba0, > limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, > limit_in_pages:1, seg_not_present:0, useable:1}, > child_tidptr=0xb4bfebe8) = 10562 > poll([{fd=3, events=POLLIN|POLLPRI}], 1, 1000) = -1 EINTR (Interrupted > system call) > +++ killed by SIGABRT +++ > ---cut--- > > Ganglia was running `rrdtool graph - --daemon > unix:/var/run/rrdcached/rrdcached.sock ...` command when it crashed. > > Thanks, > > Bernard > > ___ > rrd-developers mailing list > rrd-developers@lists.oetiker.ch > https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
Re: [rrd-developers] rrdcached crashed with no logging
Hi Kevin: On Tue, Oct 7, 2008 at 7:46 PM, kevin brintnall <[EMAIL PROTECTED]> wrote: > Do you have a backtrace? Also, what OS are you using? Here's the backtrace: (gdb) bt #0 0x002e47a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x004a2815 in raise () from /lib/tls/libc.so.6 #2 0x004a4279 in abort () from /lib/tls/libc.so.6 #3 0x004d6cca in __libc_message () from /lib/tls/libc.so.6 #4 0x004dd55f in _int_free () from /lib/tls/libc.so.6 #5 0x004dd93a in free () from /lib/tls/libc.so.6 #6 0x0804ca34 in close_connection (sock=0x94a4d80) at rrd_daemon.c:1784 #7 0x0804ccde in connection_thread_main (args=0x94a4d80) at rrd_daemon.c:1888 #8 0x0046e3cc in start_thread () from /lib/tls/libpthread.so.0 #9 0x005441ae in clone () from /lib/tls/libc.so.6 I'm using CentOS 4.x i386. Please let me know if you need additional info. This is the output when I ran rrdcached in the foreground: ---cut--- rrdcached -p /var/run/rrdcached/rrdcached.pid -l //var/run/rrdcached/rrdcached.sock -g *** glibc detected *** double free or corruption (!prev): 0x094a4d80 *** Aborted (core dumped) ---cut--- Thanks a lot, Bernard ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
[rrd-developers] rrdtool --daemon option
Hi all: Would the developers consider renaming the --daemon option for rrdtool to something like --cache(d)? To the uninitiated, they might think this is the option to start rrdtool in daemon mode. Thanks, Bernard ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
[rrd-developers] [PATCH] connection_thread_main: avoid double calls to close_connection
--- src/rrd_daemon.c |9 ++--- 1 files changed, 2 insertions(+), 7 deletions(-) diff --git a/src/rrd_daemon.c b/src/rrd_daemon.c index 9c8847d..36d418b 100644 --- a/src/rrd_daemon.c +++ b/src/rrd_daemon.c @@ -1844,23 +1844,18 @@ static void *connection_thread_main (void *args) /* {{{ */ else if (status < 0) /* error */ { status = errno; - if (status == EINTR) -continue; - RRDD_LOG (LOG_ERR, "connection_thread_main: poll(2) failed."); + if (status != EINTR) +RRDD_LOG (LOG_ERR, "connection_thread_main: poll(2) failed."); continue; } if ((pollfd.revents & POLLHUP) != 0) /* normal shutdown */ -{ - close_connection(sock); break; -} else if ((pollfd.revents & (POLLIN | POLLPRI)) == 0) { RRDD_LOG (LOG_WARNING, "connection_thread_main: " "poll(2) returned something unexpected: %#04hx", pollfd.revents); - close_connection(sock); break; } -- 1.6.0.2 ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
Re: [rrd-developers] rrdcached crashed with no logging
Hi all: The patch Kevin provided solved my issue: http://www.mail-archive.com/rrd-developers@lists.oetiker.ch/msg02651.html Thanks, Bernard On Tue, Oct 7, 2008 at 8:35 PM, Bernard Li <[EMAIL PROTECTED]> wrote: > Hi Kevin: > > On Tue, Oct 7, 2008 at 7:46 PM, kevin brintnall <[EMAIL PROTECTED]> wrote: > >> Do you have a backtrace? Also, what OS are you using? > > Here's the backtrace: > > (gdb) bt > #0 0x002e47a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 > #1 0x004a2815 in raise () from /lib/tls/libc.so.6 > #2 0x004a4279 in abort () from /lib/tls/libc.so.6 > #3 0x004d6cca in __libc_message () from /lib/tls/libc.so.6 > #4 0x004dd55f in _int_free () from /lib/tls/libc.so.6 > #5 0x004dd93a in free () from /lib/tls/libc.so.6 > #6 0x0804ca34 in close_connection (sock=0x94a4d80) at rrd_daemon.c:1784 > #7 0x0804ccde in connection_thread_main (args=0x94a4d80) at rrd_daemon.c:1888 > #8 0x0046e3cc in start_thread () from /lib/tls/libpthread.so.0 > #9 0x005441ae in clone () from /lib/tls/libc.so.6 > > I'm using CentOS 4.x i386. > > Please let me know if you need additional info. > > This is the output when I ran rrdcached in the foreground: > > ---cut--- > rrdcached -p /var/run/rrdcached/rrdcached.pid -l > //var/run/rrdcached/rrdcached.sock -g > *** glibc detected *** double free or corruption (!prev): 0x094a4d80 *** > Aborted (core dumped) > ---cut--- > > Thanks a lot, > > Bernard > ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
Re: [rrd-developers] [patch] rrdcached init script and spec file
Hi Bernard, Yesterday Bernard Li wrote: > Hi Daniel: > > The init script does not work on my system (CentOS 4.x) as is, because > the `daemon` function which I have does not support --pidfile -- is > that argument necessary? > > Also, as discussed previously, I think it would be a good idea to > create a 'rrdcached' user and group and start the daemon as that user > instead of nobody. For application-specific (eg. Ganglia) > implementations, we can just put the necessary users (such as nobody, > apache, ganglia) in the rrdcached group. I think coupled with a split of the package into a cached and a normal one this would be a sensible thing. As I said before the daemon should NOT run as nobody since it writes files and there must never be any files oned by nobody ... (hence the name). cheers tobi > > Thanks, > > Bernard > > On Tue, Oct 7, 2008 at 3:58 PM, Bernard Li <[EMAIL PROTECTED]> wrote: > > Hi all: > > > > I have some comments regarding the rrdtool spec file that is in trunk > > now (which includes changes to incorporate rrdcached). > > > > First of all, thanks to Daniel for putting this together (saves me the > > work, heh). > > > > However, I have two comments: > > > > 1) I think we should break this out as a separate subpackage such as > > rrdtool-rrdcached as I don't think rrdcached is needed by your > > everyday installation (only large installations). Incorporating it in > > the main rrdtool package and especially by including an init script > > gives users the impression that this is something that is needed by > > everybody, which I don't think is the case. > > > > 2) By default, the rrdcached daemon is started once you install the > > RPM. While I think it is fine to add rrdcached as a service, I don't > > think it's a good idea to start up the daemon by default especially > > when one might want to make some configuration changes prior starting. > > It should be left to the user to start/top the daemon as they like. > > > > If you guys agree, I can go ahead and create a patch for the above two > > points. I may have additional comments after I've had some time to > > play with the new code. > > > > Thanks, > > > > Bernard > > > > -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch [EMAIL PROTECTED] ++41 62 775 9902 / sb: -9900 ___ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers