Hi!

The PowerOP infrastructure you suggest surely is one path to better runtime
power management in the Linux kernel. However, I don't like it at all in its
current implementation. Here are a few suggestions for improvements,
rewrites, and so on:

First, the table interface you suggest is ugly. If there's indeed the need for
such an abstraction, I'd favour something like

        struct powerop {
                struct list_head        powerop_values; /* linked list of 
powerop_values */
                ...
        }

        struct powerop_value {
                unsigned long           value_cur;
                unsigned long           value_min;
                unsigned long           value_max;
                struct list_head        next;
                u16                     type;
                struct powerop_value    *cross_dependency;
                struct powerop_driver   *driver;
        }

        #define POWEROP_TYPE_CPU_FREQUENCY              0x00000001
        #define POWEROP_TYPE_CPU_VOLTAGE                0x00000002
        #define POWEROP_TYPE_FRONT_SIDE_BUS_SPEED       0x00000004
        ...
        #define POWEROP_TYPE_GPU_FREQUENCY      0x00010000
        ...

and if CPU_VOLTAGE and CPU_FREQEUNCY can only be modified at the same time, (as
most cpufreq drivers require), type is 0x00000003.


Secondly, you do not adress the cross-relationships between operation points
correctly. If you change the CPU frequency, you may have to switch other
(memory, video) settings; you might even have to validate the frequency
settings for these or even additional reasons (thermal and battery reasons -
ACPI _PPC).

Thirdly, who is to decide on the power management settings? The first and
intuitive answer is the kernel. Therefore, kernel-space cpufreq governors
exist. Only under rare circumstances, you want full userspace control --
that's what the userspace cpufreq governor is for.

Foruthly, the code duplication which your implementation leads to is obvious
for the speedstep-centrino case. And in contrast to Pavel, I do not consider
it a "tiny cleanup".



I'd suggest that you try upgrading the cpufreq infrastructure to provide
full support for multiple types of POWEROPs:

a)      Setting of "policies"
        - New "min" or "max" values for all powerop_values are set, verified
          by powerop lowlevel drivers, powerop governors and external
          notifiers. E.g. if a new frequency min/max pair is required, the
          voltage level gets a new min and max value as well --> you need to
          handle recursion.
        - If necessary a new "powerop governor" is started.
           - Each powerop governor specifies which POWEROPs it can handle
                - current cpufreq governors can handle CPU_FREQUENCY,
                  CPU_VOLTAGE and FRONT_SIDE_BUS_SPEED
                - an userspace fallback-governor always "handles" the
                  parameters no other governor handles

b)      Setting of "values"
        - Each governor can initiate transitions between the "min" and "max"
          values for operationg points it aquired ownership for.
        - The new setting is notified to all other governors and to external
          notifiers. If some entitiy decides it cannot live well with this
          new setting, it breaks out. Note that this should not happen quite
          often, as the "normal" verification takes place in a) above.
          Nonetheless, if you want to break out CPU_VOLTAGE and CPU_FREQUENCY, 
you
          need it. And as it makes life for the kernel so much more
          difficult, I'm against doing so.
        - The low-level driver handling the powerop_value is called

Thanks,
        Dominik
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to