Hello everyone,

I'm facing some problems with a Python module we have to collect the
temperature of the computing nodes. Basically it reads the contents of a
file, where all temperatures are output, and then extracts the
temperature that correspond to the current node. I guess the problem
must be something related to the threading logic, although I'm not sure.
With gmond 3.1.0 it works with no problems, but with 3.6 it hangs when
it reads the first temperature, CPU0 (I ran gmond in debug mode with "-d
10" to check it).

Does anyone know what is wrong with our code? Do we have to change
anything in order for gmond version 3.6 to work? Attached is the code
for the module. Any help is appreciated.

Thanks in advance,

Rafa

-- 
Rafael Arco Arredondo
Centro de Servicios de Informática y Redes de Comunicaciones
Campus de Fuentenueva - Edificio Mecenas
Universidad de Granada
E-18071 Granada Spain
Tel: +34 958 241440   Ext:41440   E-mail: rafaa...@ugr.es
===============
"Este mensaje se dirige exclusivamente a su destinatario y puede
contener información privilegiada o confidencial. Si no es Ud. el
destinatario indicado, queda notificado de que la utilización,
divulgación o copia sin autorización está prohibida en virtud de la
legislación vigente. Si ha recibido este mensaje por error, se ruega lo
comunique inmediatamente por esta misma vía y proceda a su destrucción.

This message is intended exclusively for its addressee and may contain
information that is CONFIDENTIAL and protected by professional
privilege. If you are not the intended recipient you are hereby notified
that any dissemination, copy or disclosure of this communication is
strictly prohibited by law. If this message has been received in error,
please immediately notify us via e-mail and delete it".
================

-- 
Rafael Arco Arredondo
Centro de Servicios de Informática y Redes de Comunicaciones
Campus de Fuentenueva - Edificio Mecenas
Universidad de Granada
E-18071 Granada Spain
Tel: +34 958 241440   Ext:41440   E-mail: rafaa...@ugr.es
===============
"Este mensaje se dirige exclusivamente a su destinatario y puede
contener información privilegiada o confidencial. Si no es Ud. el
destinatario indicado, queda notificado de que la utilización,
divulgación o copia sin autorización está prohibida en virtud de la
legislación vigente. Si ha recibido este mensaje por error, se ruega lo
comunique inmediatamente por esta misma vía y proceda a su destrucción.

This message is intended exclusively for its addressee and may contain
information that is CONFIDENTIAL and protected by professional
privilege. If you are not the intended recipient you are hereby notified
that any dissemination, copy or disclosure of this communication is
strictly prohibited by law. If this message has been received in error,
please immediately notify us via e-mail and delete it".
================
# encoding: utf-8

import time
import string
import threading
import socket
import re

# Hebra que abre el archivo de temperaturas
temp_thread = None

# Cerrojo para realizar la sincronización
lock = threading.Lock()

# Nombre del host
# fcn050.fgrid.ugr.es
hostname = socket.gethostname()
hostmodif = ""
# fcn50
match = re.search("(fc[nmg])0*([0-9]+).fgrid.ugr.es", hostname)
if match:
    hostmodif = match.group(1) + match.group(2)

# Intervalo de actualización de temperaturas
REFRESH_RATE = 10

# Diccionario con las estadísticas detemperatura
stats = {"temp_cpu0":0, "temp_cpu1":0,
         "temp_amb1":0, "temp_amb2":0,
         "temp_p0_dimm":0, "temp_p1_dimm":0,
         "temp_mb1":0, "temp_mb2":0}

TEMP_FILE = "/usr/local/admin/util/tmp_nodos.txt"

# Clase para capturar la temperatura
class TempThread(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)
        self.running = False
        self.shuttingdown = False

    def shutdown(self):
        self.shuttingdown = True
        if not self.running:
            return
        self.join()

    def run(self):
        global stats
        temp_temp_cpu0 = 0
        temp_temp_cpu1 = 0
        temp_temp_amb1 = 0
        temp_temp_amb2 = 0
        temp_temp_p0_dimm = 0
        temp_temp_p1_dimm = 0
        temp_temp_mb1 = 0
        temp_temp_mb2 = 0
        
        self.running = True

        # Suspender durante REFRESH_RATE segundos
        time.sleep(REFRESH_RATE)

        while not self.shuttingdown:
            try:
                # Abrir el archivo
                f = open(TEMP_FILE, "r")
            except IOError:
                lock.acquire()
                stats["temp_cpu0"] = 0
                stats["temp_cpu1"] = 0
                stats["temp_amb1"] = 0
                stats["temp_amb2"] = 0
                stats["temp_p0_dimm"] = 0
                stats["temp_p1_dimm"] = 0
                stats["temp_mb1"] = 0
                stats["temp_mb2"] = 0

                lock.release
                break

            # Leer las temperaturas
            tmphost = f.readline().rstrip()
            while tmphost != hostmodif:
                tmphost = f.readline().rstrip()

            pattern = "\S+\s+:\s([0-9]+) degrees C"

            tmpline = f.readline().rstrip()
            match = re.search(pattern, tmpline)
            temp_temp_cpu0 = string.atoi(match.group(1))

            tmpline = f.readline().rstrip()
            match = re.search(pattern, tmpline)
            temp_temp_cpu1 = string.atoi(match.group(1))

            tmpline = f.readline().rstrip()
            match = re.search(pattern, tmpline)
            temp_temp_amb1 = string.atoi(match.group(1))

            tmpline = f.readline().rstrip()
            match = re.search(pattern, tmpline)
            temp_temp_amb2 = string.atoi(match.group(1))

            tmpline = f.readline().rstrip()
            match = re.search(pattern, tmpline)
            temp_temp_p0_dimm = string.atoi(match.group(1))

            tmpline = f.readline().rstrip()
            match = re.search(pattern, tmpline)
            temp_temp_p1_dimm = string.atoi(match.group(1))

            tmpline = f.readline().rstrip()
            match = re.search(pattern, tmpline)
            temp_temp_mb2 = string.atoi(match.group(1))

            tmpline = f.readline().rstrip()
            match = re.search(pattern, tmpline)
            temp_temp_mb1 = string.atoi(match.group(1))

            f.close()
            
            # ...y almacenarlas en el diccionario global
            lock.acquire()
            stats["temp_cpu0"] = temp_temp_cpu0
            stats["temp_cpu1"] = temp_temp_cpu1
            stats["temp_amb1"] = temp_temp_amb1
            stats["temp_amb2"] = temp_temp_amb2
            stats["temp_p0_dimm"] = temp_temp_p0_dimm
            stats["temp_p1_dimm"] = temp_temp_p1_dimm
            stats["temp_mb1"] = temp_temp_mb1
            stats["temp_mb2"] = temp_temp_mb2

            lock.release()

            # Suspender hasta la siguiente iteración
            time.sleep(REFRESH_RATE)

        self.running = False        

def get_temp_stat(name):
    # Leer el valor especificado en name
    lock.acquire()
    val = stats[name]
    lock.release()
    return val

def metric_init(params):
    global descriptors, temp_thread

    # Definición de las métricas
    d1 = {'name': 'temp_cpu0',
          'call_back': get_temp_stat,
          'time_max': 1200,
          'value_type': 'uint',
          'units': 'celsius deg',
          'slope': 'both',
          'format': '%u',
          'description': 'CPU0 temperature',
          'groups': 'temperature'}

    d2 = {'name': 'temp_cpu1',
          'call_back': get_temp_stat,
          'time_max': 1200,
          'value_type': 'uint',
          'units': 'celsius deg',
          'slope': 'both',
          'format': '%u',
          'description': 'CPU1 temperature',
          'groups': 'temperature'}

    d3 = {'name': 'temp_amb1',
          'call_back': get_temp_stat,
          'time_max': 1200,
          'value_type': 'uint',
          'units': 'celsius deg',
          'slope': 'both',
          'format': '%u',
          'description': 'Amb1 temperature',
          'groups': 'temperature'}

    d4 = {'name': 'temp_amb2',
          'call_back': get_temp_stat,
          'time_max': 1200,
          'value_type': 'uint',
          'units': 'celsius deg',
          'slope': 'both',
          'format': '%u',
          'description': 'Amb2 temperature',
          'groups': 'temperature'}

    d5 = {'name': 'temp_p0_dimm',
          'call_back': get_temp_stat,
          'time_max': 1200,
          'value_type': 'uint',
          'units': 'celsius deg',
          'slope': 'both',
          'format': '%u',
          'description': 'P0dimm temperature',
          'groups': 'temperature'}

    d6 = {'name': 'temp_p1_dimm',
          'call_back': get_temp_stat,
          'time_max': 1200,
          'value_type': 'uint',
          'units': 'celsius deg',
          'slope': 'both',
          'format': '%u',
          'description': 'P1dimm temperature',
          'groups': 'temperature'}

    d7 = {'name': 'temp_mb1',
          'call_back': get_temp_stat,
          'time_max': 1200,
          'value_type': 'uint',
          'units': 'celsius deg',
          'slope': 'both',
          'format': '%u',
          'description': 'MB1 temperature',
          'groups': 'temperature'}

    d8 = {'name': 'temp_mb2',
          'call_back': get_temp_stat,
          'time_max': 1200,
          'value_type': 'uint',
          'units': 'celsius deg',
          'slope': 'both',
          'format': '%u',
          'description': 'MB2 temperature',
          'groups': 'temperature'}

    descriptors = [d1,d2,d3,d4,d5,d6,d7,d8]

    # Inicializar e iniciar la hebra que lee el archivo de temperaturas
    temp_thread = TempThread()
    temp_thread.start()

    # Devolver los descriptores a gmond
    return descriptors
 
def metric_cleanup():
    # Liberar los recursos del módulo
    temp_thread.shutdown()
 
# Código para depurar y pruebas de unidad
if __name__ == '__main__':
    params = {'RefreshRate': '10'}
    metric_init(params)
    
    while True:
        for d in descriptors:
            v = d['call_back'](d['name'])
            print 'El valor de %s es %u' % (d['name'],  v)
        time.sleep(10)
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to