Re: [VirtualGL-Users] automatically choosing GPU with vglrun

DRC Thu, 19 Sep 2013 16:06:14 -0700

I don't know why you're getting different screen numbers, but like I 
said, the generic solution for this needs to be able to handle an 
arbitrary mapping anyhow, since some people will choose to configure 
multiple GPUs as multiple independent X servers instead of multiple screens.


I would suggest that the best thing to do is submit a patch through the 
SourceForge tracker system.  The patch should contain whatever new 
scripts are necessary to support this functionality and whatever changes 
you made to vglrun to accommodate it.  In your new scripts, you should 
have a copyright header similar to the one in vglrun, except replacing 
the copyright holder with either yourself or, if you're under IP 
contract with a company, the company name.  Make sure you have written 
permission to open source this technology if the latter is the case.  In 
vglrun, you should add your copyright notice to the existing ones.  The 
copyright header is important, because it gives me proof of your intent 
to open source the new code and whatever changes you made to the old code.

I can take the patch and add the GPU-to-display mapping logic to it at 
some point in the future, as well as do some testing.  There's no rush 
on submitting the patch-- just letting you know the procedure for doing 
so, once you are happy with the code.


On 9/19/13 5:51 AM, Rafael Guimaraes wrote:
> In fact, I have almost the same xorg.conf in two machines and the behave
> differently with respect to :0.0 to :0.3 or :0.1 to :0.4. I still didn't
> get what I am missing here... Below you can see one of the my xorg.conf.
> This one is for my 4 Quadro FX 7000 configuration, and this is giving
> me :0.1 to :0.4. It's a supermicro server with an internal graphics card
> (mga module) and an appliance with 4 Quadro FX 7000 connected through
> PCI Express. In my other computer, I have 4 Quadro Plex 2200 S4
> connected the same way to a Sun server with an internal graphic card
> (ast module) with the same xorg.conf (exchanging mga to ast, of course)
> and I get different display numbers for the gpus (:0.0 to :0.3). Am I
> missing something?
>
> Also, if you wanna take a look, below the xorg.conf, I have copied my
> gpu load balancing script. It tries to get gpu usage from nvidia-smi,
> depending on the video card, this is not available (not only for
> GeForces, as Philippe said, I've got some Quadro Plex 2200 S4 and it
> doesn't get gpu usage as well). Then I try to get memory usage. Finally,
> if no nvidia-smi info is available (will it ever happen?), I search
> among all running processes and count how many processes are using each
> gpu. This last option is there just because I had it implemented before
> knowing the existence of nvidia-smi, so I left it there just in case...
>
> Cheers,
> Rafael
>
>
> Section "DRI"
>          Mode 0666
> EndSection
>
> Section "ServerLayout"
>      Identifier     "Layout0"
>      Screen      0  "Screen0"
>      Screen      1  "Screen1" RightOf "Screen0"
>      Screen      2  "Screen2" RightOf "Screen1"
>      Screen      3  "Screen3" RightOf "Screen2"
>      Screen      4  "Screen4" RightOf "Screen3"
>      InputDevice    "Keyboard0" "CoreKeyboard"
>      InputDevice    "Mouse0" "CorePointer"
> EndSection
>
> Section "Files"
>      RgbPath         "/usr/X11R6/lib/X11/rgb"
>      FontPath        "unix/:7100"
> EndSection
>
> Section "Module"
>      Load           "dbe"
>      Load           "extmod"
>      Load           "type1"
>      Load           "freetype"
>      Load           "glx"
> EndSection
>
> Section "InputDevice"
>      # generated from data in "/etc/sysconfig/mouse"
>      Identifier     "Mouse0"
>      Driver         "mouse"
>      Option         "Protocol" "IMPS/2"
>      Option         "Device" "/dev/input/mice"
>      Option         "Emulate3Buttons" "no"
>      Option         "ZAxisMapping" "4 5"
> EndSection
>
> Section "InputDevice"
>      # generated from data in "/etc/sysconfig/keyboard"
>      Identifier     "Keyboard0"
>      Driver         "kbd"
>      Option         "XkbLayout" "us"
>      Option         "XkbModel" "pc105"
> EndSection
>
> Section "Monitor"
>      Identifier     "Monitor0"
>      VendorName     "Unknown"
>      ModelName      "Unknown"
>      HorizSync       30.0 - 110.0
>      VertRefresh     50.0 - 150.0
>      Option         "DPMS"
> EndSection
>
> Section "Monitor"
>      Identifier     "Monitor1"
>      VendorName     "Unknown"
>      ModelName      "Unknown"
>      HorizSync       30.0 - 110.0
>      VertRefresh     50.0 - 150.0
>      Option         "DPMS"
> EndSection
>
> Section "Monitor"
>      Identifier     "Monitor2"
>      VendorName     "Unknown"
>      ModelName      "Unknown"
>      HorizSync       30.0 - 110.0
>      VertRefresh     50.0 - 150.0
>      Option         "DPMS"
> EndSection
>
> Section "Monitor"
>      Identifier     "Monitor3"
>      VendorName     "Unknown"
>      ModelName      "Unknown"
>      HorizSync       30.0 - 110.0
>      VertRefresh     50.0 - 150.0
>      Option         "DPMS"
> EndSection
>
> Section "Monitor"
>      Identifier     "Monitor4"
>      VendorName     "Unknown"
>      ModelName      "Unknown"
>      HorizSync       30.0 - 110.0
>      VertRefresh     50.0 - 150.0
>      Option         "DPMS"
> EndSection
>
> Section "Device"
>      Identifier     "Device0"
>      Driver         "nvidia"
>      VendorName     "NVIDIA Corporation"
>      BoardName      "Quadro FX 7000"
>      BusID          "PCI:10:0:0"
> EndSection
>
> Section "Device"
>      Identifier     "Device1"
>      Driver         "nvidia"
>      VendorName     "NVIDIA Corporation"
>      BoardName      "Quadro FX 7000"
>      BusID          "PCI:11:0:0"
> EndSection
>
> Section "Device"
>      Identifier     "Device2"
>      Driver         "nvidia"
>      VendorName     "NVIDIA Corporation"
>      BoardName      "Quadro FX 7000"
>      BusID          "PCI:137:0:0"
> EndSection
>
> Section "Device"
>      Identifier     "Device3"
>      Driver         "nvidia"
>      VendorName     "NVIDIA Corporation"
>      BoardName      "Quadro FX 7000"
>      BusID          "PCI:138:0:0"
> EndSection
>
> Section "Device"
>          Identifier "Videocard0"
>          Driver     "mga"
> EndSection
>
> Section "Screen"
>          Identifier "Screen0"
>          Device     "Videocard0"
>          Monitor    "Monitor0"
>          DefaultDepth    24
>          SubSection "Display"
>                  Depth    24
>          EndSubSection
> EndSection
>
> Section "Screen"
>      Identifier     "Screen1"
>      Device         "Device0"
>      Monitor        "Monitor1"
>      DefaultDepth    24
>      Option         "UseDisplayDevice" "none"
>      SubSection     "Display"
>          Depth       24
>      EndSubSection
> EndSection
>
> Section "Screen"
>      Identifier     "Screen2"
>      Device         "Device1"
>      Monitor        "Monitor2"
>      DefaultDepth    24
>      Option         "UseDisplayDevice" "none"
>      SubSection     "Display"
>          Depth       24
>      EndSubSection
> EndSection
>
> Section "Screen"
>      Identifier     "Screen3"
>      Device         "Device2"
>      Monitor        "Monitor3"
>      DefaultDepth    24
>      Option         "UseDisplayDevice" "none"
>      SubSection     "Display"
>          Depth       24
>      EndSubSection
> EndSection
>
> Section "Screen"
>      Identifier     "Screen4"
>      Device         "Device3"
>      Monitor        "Monitor4"
>      DefaultDepth    24
>      Option         "UseDisplayDevice" "none"
>      SubSection     "Display"
>          Depth       24
>      EndSubSection
> EndSection
>
> ------------------------------------------
>
> #!/bin/bash
> # Check whether we have nvidia-smi
> NVSMI=$(which nvidia-smi)
> if [ $? -ne 0 ]; then
>     exit 1
> fi
>
> # Get GPU usage (is it a number?)
> read -a PROC < <(nvidia-smi -q --display=UTILIZATION |grep -w Gpu | awk
> 'NR==1{min=$3;pos=1}NR>1 && $3<min{min=$3;pos=NR}END{print min,pos}')
> re='^[0-9]+$'
> if [[ "${PROC[0]}" =~ $re ]] ; then
>     echo "${PROC[1]}"
>     exit
> fi
>
> # Get memory usage (is it a number?)
> read -a MEM < <(nvidia-smi -q --display=MEMORY |grep -w Used | awk
> 'NR==1{min=$3;pos=1}NR>1 && $3<min{min=$3;pos=NR}END{print min,pos}')
> re='^[0-9]+$'
> if [[ "${MEM[0]}" =~ $re ]] ; then
>     echo "${MEM[1]}"
>     exit
> fi
>
> # Get the number of processes using GPUs
> # At first get the number of GPUs
> for (( i=1 ; i <= $(nvidia-smi -L |wc -l) ; i++ )); do
>     GPU[$i]=0
> done
>
> CURPATH=$(dirname $0)            # The path of the script
>
> # Check every process (except root, since we assume root is not using gpus)
> for i in `ps -ef |grep -v "^root" |awk '{print $2;}' |grep -v PID`
> do
>          # Get the GPU currently being used by the process
>          g=`$CURPATH/getgpu $i`
>          if [ $g -gt 0 2>/dev/null ] && [ $g -le ${#GPU[@]} 2>/dev/null
> ] ; then
>                  NLIBGL=`ldd /proc/$i/exe 2>/dev/null |grep libGL |wc
> -l`        # Consider that just processes depending on libGL.so are
> using GPU
>                  if [ $NLIBGL -gt 0 ] ; then
>                          GPU[$g]=$(( ${GPU[$g]} + 1 ))
>                  fi
>          fi
> done
>
> # Get the less congested GPU (the one with less processes using it)
> GPUMIN=1
> for (( g=2 ; g < ${#GPU[@]} ; g++ )) ; do
>     if [ ${GPU[$g]} -lt ${GPU[$GPUMIN]} ] ; then
>        GPUMIN=$g
>     fi
> done
> echo $GPUMIN
>
>
> 2013/9/19 Philippe <philippe.ra...@gmail.com
> <mailto:philippe.ra...@gmail.com>>
>
>     Hi guys,
>     There's a long time, I did a script to achieve that (before the
>     273.x drivers from memory). On any machine, if you want your gpu0
>     corresponding to your :0.0 screen, it's your job to configure
>     (through nvidia-xconfig for example) the number of the X display.
>     So just for example, here is the content of a xorg.conf with 2
>     virtual screens running on 2 graphics cards :
>
>     Section "Monitor"
>          Identifier     "Monitor0"
>          VendorName     "Unknown"
>          ModelName      "Unknown"
>          HorizSync       28.0 - 33.0
>          VertRefresh     43.0 - 72.0
>          Option         "DPMS"
>     EndSection
>
>     Section "Device"
>          Identifier     "Device0"
>          Driver         "nvidia"
>          VendorName     "NVIDIA Corporation"
>          BoardName      "GeForce GTX 680"
>     EndSection
>
>     Section "Screen"
>          Identifier     "Screen0"
>          Device         "Device0"
>          Monitor        "Monitor0"
>          DefaultDepth    24
>          Option         "ConnectedMonitor" "DFP"
>          SubSection     "Display"
>              Virtual     1280 1024
>              Depth       24
>          EndSubSection
>     EndSection
>
>     Section "Monitor"
>          Identifier     "Monitor1"
>          VendorName     "Unknown"
>          ModelName      "Unknown"
>          HorizSync       28.0 - 33.0
>          VertRefresh     43.0 - 72.0
>          Option         "DPMS"
>     EndSection
>
>     Section "Device"
>          Identifier     "Device1"
>          Driver         "nvidia"
>          VendorName     "NVIDIA Corporation"
>          BoardName      "GeForce GTX 680"
>     EndSection
>
>     Section "Screen"
>          Identifier     "Screen1"
>          Device         "Device1"
>          Monitor        "Monitor1"
>          DefaultDepth    24
>          Option         "ConnectedMonitor" "DFP"
>          SubSection     "Display"
>              Virtual     1280 1024
>              Depth       24
>          EndSubSection
>     EndSection
>
>
>     With that, you have :0.0 and :0.1 availables.
>     Then you can use your current script with value from 0 on every
>     machines.
>
>     Personnaly, I use only Geforce cards, after 273.x (or maybe 173.x ?
>     don't remember) nvidia lock the access to the GPU utilization in
>     nvidia-smi, giving access to this feature to quadro cards only. So
>     now I can choose the card randomly, or but checking the amount of
>     graphic card's RAM currently in use.
>
>     Hope this helps.
>
>
>     On Wed, Sep 18, 2013 at 10:52 PM, DRC
>     <dcomman...@users.sourceforge.net
>     <mailto:dcomman...@users.sourceforge.net>> wrote:
>
>         Neat trick.  :)
>
>         I don't think that the issue you're seeing has anything to do
>         with VGL
>         per se.  I think it's how your X server is configured.  $gpu in your
>         examples below refers to an X screen number.  Typically, in a
>         VirtualGL
>         multi-GPU environment, one would configure the X server such
>         that each
>         GPU occupies a separate X screen (0.0, 0.1, 0.2, etc.)  Then you can
>         address a specific GPU by setting VGL_DISPLAY to :0.0, :0.1, etc.
>         Others, however, prefer to run a completely new X server instance on
>         each GPU, in which case they would be addressed as :0.0, :1.0,
>         etc.  In
>         your case, I think one of your X servers is configured
>         differently than
>         the other, which is why the screen numbers are different.
>
>         Ultimately, I don't think it's going to be valid to assume that the
>         DISPLAY-to-GPU mapping is constant, so perhaps the best approach
>         would
>         be to introduce either a new environment variable (VGL_GPUS, for
>         instance) or a config file that defines a list of X display to GPU
>         mappings.  If this environment variable/config setting is
>         defined, then
>         vglrun could assume that you want to use load balancing, and it
>         could
>         call nvidia-smi (or an equivalent on ATI, if such exists) to
>         find the
>         least-loaded GPU.
>
>         I would be interested in integrating this into the project.
>
>
>         On 9/18/13 3:13 PM, Rafael Guimaraes wrote:
>         > Hi folks,
>         >
>         > I have built a script that checks which GPU is currently being less 
> used
>         > (through nvidia-smi) and calls vglrun with this information so that 
> my
>         > processes are balanced among the available GPUs.
>         >
>         > The problem is that I use the same script in two different machines
>         > (both with 4 GPUs) and they map the GPUs differently. In the first
>         > computer, I can run vlgrun -d :0.$gpu , where $gpu may vary from 1 
> to 4
>         > (equivalent to GPU 0 to 3 as reported by nvidia-smi). However, in 
> the
>         > second computer, $gpu may vary from 0 to 3 (equivalent to GPU 0 to 
> 3 as
>         > reported by nvidia-smi).
>         >
>         > Is there a way to address GPUs by vglrun in the same way in both
>         > computers? If not, is there a way for my script to find if vglrun
>         > addresses GPUs from 0 to 3 or from 1 to 4?
>         >
>         > Thanks in advance!
>         >
>         > Cheers,
>         >
>         > Rafael Guimarães
>
>         
> ------------------------------------------------------------------------------
>         LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>         1,500+ hours of tutorials including VisualStudio 2012, Windows
>         8, SharePoint
>         2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power
>         Pack includes
>         Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>         
> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
>         _______________________________________________
>         VirtualGL-Users mailing list
>         VirtualGL-Users@lists.sourceforge.net
>         <mailto:VirtualGL-Users@lists.sourceforge.net>
>         https://lists.sourceforge.net/lists/listinfo/virtualgl-users
>
>
>
>     
> ------------------------------------------------------------------------------
>     LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>     1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
>     SharePoint
>     2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power
>     Pack includes
>     Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>     
> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
>     _______________________________________________
>     VirtualGL-Users mailing list
>     VirtualGL-Users@lists.sourceforge.net
>     <mailto:VirtualGL-Users@lists.sourceforge.net>
>     https://lists.sourceforge.net/lists/listinfo/virtualgl-users
>
>
>
>
> ------------------------------------------------------------------------------
> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> VirtualGL-Users mailing list
> VirtualGL-Users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtualgl-users
>

------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
VirtualGL-Users mailing list
VirtualGL-Users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtualgl-users

Re: [VirtualGL-Users] automatically choosing GPU with vglrun

Reply via email to