Nish and Lucas,

I and a few members of my team have a lot of experience doing HW attrib style 
test scheduling so please feel free specifically bring me and Chris Nelson into 
future detailed discussions for the scheduling part of this discussion.  We are 
both pretty new to Linux still so you probably no better than us how to get the 
attrib info, but we might have some ideas in the space of adding some tables 
and implementing queries to schedule jobs in this way.

Just FYI, we called our prior solution constraints based scheduling (as in HW  
and SW constraints being options) and had support for constraints being both 
hard and soft.  The idea was that a hard constraint was similar to the autotest 
"only if needed" labels in that if a job didn't request a hard constraint 
explicitly it wouldn't be considered to match the host.  I think I like your 
terminology (only if needed) better, and think it could equally be applied to 
HW attribute scheduling as for label based scheduling.  That said, this would 
need to be a per-host & per-attribute setting, and not applied to every host 
that has the attribute (as opposed to how "only if needed" is currently a label 
specific attribute that would apply to every host that uses that label).   Does 
that makes sense?

See below for more comments.

-Dan DeFolo

> -----Original Message-----
> From: autotest-boun...@test.kernel.org [mailto:autotest-
> boun...@test.kernel.org] On Behalf Of Nishanth Aravamudan
> Sent: Wednesday, May 16, 2012 2:51 PM
> To: l...@redhat.com
> Cc: autotest@test.kernel.org
> Subject: [Autotest] [RFC] adding hardware inventory to autotest
> 
> Hi everyone,
> 
> Lucas and I were discussing a new feature that I think would be very useful
> for Autotest: hardware inventory.
> 
> With the upcoming/ongoing feature to add tighter Cobbler support to
> Autotest I, at least, have a need to know what kinds of machines are
> available to run jobs on. I want to know things like: CPU information, amount
> of memory, PCI devices, etc.
> 
>  - My initial proposal is to use lshw to acquire information from
>    running systems.
>    - An alternative would be smolt. I've found some issues with the
>      distribution versions of smolt on ppc64, at least.
>    - In theory, we could have a pluggable infrastructure, much like the
>      install_server support itself, and the administrator could specify
>      how to obtain the information.

That makes sense to me.  One thing to consider is subdividing the inventory 
commands as makes sense by category to the average user and just return back a 
dictionary of attributes.  For example, if the inventory interface command were 
called hwprobe I could imagine calling something like this to get CPU info:

hwprobe -t cpu

{ 'cpu_model: 'Intel(R) Xeon(R) CPU           X5550' ,
  'cpu_speed': '2.27GHz',
  'cpu_cores': 16,
....
}

Admins could add new categories of things to probe as well as control where the 
information for each category of probe should come from. 

Also, there should be some standard default value to report (e.g. Unknown, "", 
None, ??, etc) when the architecture specific probing logic isn't able to come 
up with a value for an expected field.

Finally, there should be some normalization logic for the size and speed 
attributes so they are all stored using standard units.  For example, convert 
all CPU speeds to GHz before story, all memory sizes to either MB or GB 
(probably MB), all disk sizes to GB, etc.  This is the one thing that perhaps 
shouldn't completely be up to admins.  At most, before recording their values 
into the DB they should be normalized.   I'll comment more on this later.

>  - Adding hardware information immediately requires us to extend the
>    information stored in a Machine object
>    - My current list includes (sources in parentheses):
>      Machine model (dmidecode, lsvpd)
>      CPU version (/proc/cpuinfo)
>      CPU speed (/proc/cpuinfo)
>      Memory installed (/sys/devices/system/memory or /proc/meminfo)
>      Swap size (/proc/meminfo, /proc/swaps?)
>      Serial number (dmidecode, lsvpd)
>      CPU topology (# of cores, # of hardware threads, sysfs, /proc/cpuinfo))
>      PCI devices (lspci)
>      KVM capable (/proc/cpuinfo)
>      CPU flags (/proc/cpuinfo)
>      Timestamp of last inventory
>      Version of last inventory tool used
>      BIOS/system firmware level

I think all of the above sounds like a great start for the core HW info to 
gather.

I would also add the following to the list of things to gather (I'm still 
trying to map the hp-ux way of doing this to Linux so if anything below sounds 
a bit off, bring me back in line!):
* basic disk device attribs (type, size, device, pci/hw path) mapped to a 
unique ID that won't change with each OS install  (lsscsi, /proc/scsi)
* SAN disk attribs (WWID, transport) (lsscsi -t, ??)
* admin attribs for disks - is disk available for anyone to use (scratch) vs 
reserved with non-specific purpose(don't touch) vs reserved for specific 
purpose(swap, dump, reserved for VM disk images, etc) 
* CPU hyperthreading - on/off (??)

>      - Reading through this list, I imagine the implementation becomes
>        an InventoryInterface, with per-architecture implementations of
>        how to actually obtain the data.
>    - Additionally, one can imagine end-users may have site-specific,
>      internal or controlled extra data they would like to be searchable
>      & storable per-machine. I think it makes sense to have an
>      additional, admin-interface defined table to store a list of labels
>      and how to get the data that should be stored with that label. We
>      would then store a hash of the obtained information in the
>      inventory job.

This hash would have all the HW data right?  Not just the stuff the admin added 
(so default + admin added).

I think there is value in doing the inventory for each job and actually storing 
that information in the job regardless of if the job was scheduled through 
classic mechanisms (hostname or label matching) or HW attrib comparative 
scheduling (discussed below).  Saving the HW info in the job is important as 
all we can confirm right now is which host the job ran on.  If the host is 
changing over time (having memory, IO cards, storage, etc added/removed) having 
that data stored and available in queryable form would be valuable.  I 
recognize much of this info is in perhaps available in the job sysinfo files, 
but if DB tables are going to be added to make these attribs available for 
scheduling I would vote to make them available as job attribs as well.    I 
think this is what you were after with the "store a hash of obtained 
information in the inventory job", but I wasn't clear if there would be an 
inventory job linked to each test job.

NOTE: one way to handle this would be to potentially revision host entries in 
inventory part of the DB so that host + inventory_revision would be a unique 
snapshot of these attribs and that would be all you need to store in each test 
job.  

>  - Have a contrib script appropriate for
>    /var/lib/cobbler/triggers/add/system/ to automatically run an
>    inventory job on system add.

Would we additionally want to trigger inventory calls each time a job runs on a 
host (during the VERIFY state)?  It seems like you are proposing this in the 
next bullet below but I didn't see it explicitly listed

>  - Extend the create job UI to allow searching by hardware information.
>    This requires seeing what once satisfied the request, re-running
>    inventory on that host, and if the required data is still present,
>    running the job.

In general it sounds like your process is in alignment with what I was thinking 
of, but having implemented a similar HW attribute scheduling system before I 
know this is getting to a potentially complicated and performance sensitive 
part of this discussion.  You need scheduling to be fast to make the queing 
efficient and reports like I may someday want to add (estimated completion time 
reports using scheduling simulations + historic test duration data) possible.  
Further, I want to make sure that the HW attribute comparison is actually not 
finalized until the end of the process you describe above.  For example, if 
after the first evaluation the scheduler picks host 1, if host 1 doesn't still 
match after the re-inventory the scheduler should just treat that like any 
other failed host verify (repeat the comparison and perhaps come up with host 
3, and then try that).

In terms of implementation comments...
The first issue is with units.  If a user asks for memory > 3030MB and you are 
storing your inventory info for memory in as a string using a mish-mash of 
units (perhaps because different architectures report different units) then it 
becomes increasingly hard to translate the requested criteria for the job into 
a basic DB query.  You first have to get memory for all machines, then convert 
it to their units, then do the comparison.  As per the above, I suggest 
avoiding this by standardizing the units for the inventory interface (so you 
know what unit is in the DB for every field with a numeric value) and then 
doing a conversion from the user's requested unit to the DB native unit before 
you do the scheduling query.

The second issue you have covered above in terms of needing to do a 
re-inventory.  Since we can't re-inventory your entire HW pool each time a new 
job request comes in, you are right that doing a first pass guess and then 
re-checking in the verify stage is the right thing to do.  If we do the 
inventory with every job or at most, after every re-install, it should be 
pretty accurate and there will be very few cases of that first guess not being 
a good one.

In this model of delayed HW attribute scheduling, the hw_attribute_scheduler 
module (think of the metahost scheduler module) logic to handle at least 4 
scenarios during each scheduler tick (or if performance is slow, every X ticks):

1) queued - the job still matches a host in a normal state.  If so, just wait 
until one of those hosts is Ready.
2) ready - a matching host becomes "ready", queue it up to run just like a 
normal job (assuming the re-inventory will put the job back in queued if it no 
longer matches)
3) starved - the job doesn't match anything but failed hosts (which shouldn't 
be touched by HW attrib scheduled jobs in my mind) or reserved hosts (hopefully 
coming soon as per issue #360).  In this case, it might be a long time before 
the job starts so an email to admins or user might be in order
4) orphaned - the job doesn't match any hosts at all - probably handle the same 
as starved, but just call out that the difference (starved may eventually run, 
orphaned is unlikely to ever run w/o intervention).

NOTE: Starved and orphaned jobs don't necessarily require immediate action in 
the scheduler (e.g. aborting the job) as depending on the use case and the 
attribute that is starved on, it might be something an admin is staffed to keep 
an eye on and fix (perhaps swapping IO cards, configuring storage, etc).  It 
might also just clear up after the next inventory refresh as perhaps a system 
was temporarily modified for a special job and will be put back to its normal 
state at the end of it.   That said, the job status should clearly show starved 
an orphaned jobs with either a new status value (not just queued) and somehow 
bring attention to the fact that there are some jobs waiting on HW attribs X, 
Y, Z.    

> This becomes necessary in particular for the case of
>    PCI devices being removed, but also could come about when hostnames
>    get recycled (for instance) or hardware is upgraded.

Also, it is necessary for unplanned things like memory dying, disks dying, etc. 
 I'm not proposing there be a sanity check for HW yet (e.g. during the 
equivalent of the verify stage my old automation would avoid using host that 
unexpected HW attributes change and ask an admin to review the host) as that 
can create a lot of overhead depending on how many hosts there are and how 
volatile your HW is, but there needs to be allowances for HW attribs frequently 
changing.

> Lucas & I thought the discussion from here was better-served being on the
> mailing-list to allow everyone interested to chime in.
> 
> Thanks,
> Nish
> 
> --
> Nishanth Aravamudan <n...@us.ibm.com>
> IBM Linux Technology Center
> 
> _______________________________________________
> Autotest mailing list
> Autotest@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
_______________________________________________
Autotest mailing list
Autotest@test.kernel.org
http://test.kernel.org/cgi-bin/mailman/listinfo/autotest

Reply via email to