Text formatted message (for non HTML mailers)
-------- Forwarded Message --------
Subject: R usage in MZmine - What about replacing JRIengine to
get multi-threading support?
Date: Tue, 02 Dec 2014 11:36:55 +0100
From: Gauthier Boaglio <gauthier.boag...@gmail.com>
To: mzmine-devel@lists.sourceforge.net
Hi everyone,
I begun to use Rserve (multi-threaded), instead of JRIengine
(singleton / mono-threaded), in my personal experimental branch
and really think we could replace JRIengine (which does not take
advantage of parallel tasks processing, as you know), by the former.
I released a version using Rserve (restricted to the "Baseline
Correction Module"). If you want to give it a try, here it is:
https://sourceforge.net/p/mzmine/code/HEAD/tree/branches/gboaglio-experimental/target/MZmine-2.11-EEE-release-20141128.zip
Main points:
-----------
* Requirements:
------------
No changes in "startMZmine" script (we just need to locate
"R_HOME", as usual).
The only requirement is to have the "Rserve" R package
installed. Then it is ran through a simple R command, looking like:
library(Rserve);Rserve(debug=TRUE/FALSE, args="--RS-enable-control")
# Where '--RS-enable-control' is used and mandatory to
enable sending "SIGKILL/SIGTERM" to the Rserve instances
# (when we need to abort a blocking 'eval' being performed)
(Note: we already had to install "rJava" for using JRI). So
this does not stand for a big additional constraint for the user.
=> The Rserve runnable (server app.) comes with the said R package
Note: I believe we could even simplify the "startMZmine"
workflow, by first trying to detect the R location automatically.
(the environment variables to be assigned in the script
would be used only as a fallback in case the detection failed)
That would more or less look like (See: RUtilities.java) :
public static String getRexecutablePath() {
String osname = System.getProperty("os.name");
if (osname != null && osname.length() >= 7 &&
osname.substring(0,7).equals("Windows")) {
LOG.log(Level.INFO, "Windows: query registry
to find where R is installed ...");
String installPath = null;
try {
Process rp =
Runtime.getRuntime().exec("reg query HKLM\\Software\\R-core\\R");
StreamHog regHog = new
StreamHog(rp.getInputStream(), true);
rp.waitFor();
regHog.join();
installPath = regHog.getInstallPath();
} catch (Exception rge) {
LOG.log(Level.SEVERE, "ERROR: unable to run REG to find the
location of R: "+rge);
return null;
}
if (installPath == null) {
LOG.log(Level.SEVERE, "ERROR: canot find path to R. Make sure reg
is available and R was installed with registry settings.");
return null;
}
return installPath + "\\bin\\R.exe
<smb://bin//R.exe>";
}
File f = new
File("/Library/Frameworks/R.framework/Resources/bin/R");
if (f.exists()) return f.getPath();
f = new File("/usr/local/lib/R/bin/R");
if (f.exists()) return f.getPath();
f = new File("/usr/lib/R/bin/R");
if (f.exists()) return f.getPath();
f = new File("/sw/bin/R");
if (f.exists()) return f.getPath();
f = new File("/usr/common/bin/R");
if (f.exists()) return f.getPath();
f = new File("/opt/bin/R");
if (f.exists()) return f.getPath();
return null;
}
* The way it works:
----------------
Rserve is a server (implements a communication protocol with
R) and, to keep it simple, it should be run on localhost and default
port 6311 (this is something that should probably be mentioned
to the user, for security and eventually firewall configuration
reasons).
Each time we create a new "RConnection", the main instance of
Rserve starts a new child process.
- We can store the PID of this instance for later termination.
- We can run asynchronously as many instances as we want (as
long as we take care to close/terminate the ones that are no
longer used).
=> I already implemented a basic wrapper (that should be
enhanced) for those operations:
https://sourceforge.net/p/mzmine/code/HEAD/tree/branches/gboaglio-experimental/src/main/java/net/sf/mzmine/util/RSession.java
(Simplified version of the class in attachment).
=> The code for starting Rserve main server instance is
located in "RUtilities.java" for now:
https://sourceforge.net/p/mzmine/code/HEAD/tree/branches/gboaglio-experimental/src/main/java/net/sf/mzmine/util/RUtilities.java
=> USAGE (a sample use case can be found in the
BaselineCorrectionTask.java):
Basically, evaluating one or more R commands is done as
follows:
- Create and open a new session:
// (This will automatically start Rserve, if not
already running, and open a new slave instance for this task)
String[] reqPackages = new String[] { "ptw" };
RSession rSession = new RSession(reqPackages);
rSession.open();
- Check the additional packages necessary to perform
the commands for the given session:
// Returns the first failing required package name
or null if successful
String missingPackage =
rSession.loadRequiredPackages();
- Do some R evaluations, such as:
// Set chromatogram.
rSession.assignDoubleArray("chromatogram", chromatogram);
// Calculate baseline.
rSession.eval("baseline <- asysm(chromatogram," +
smoothing + "," + asymmetry + ")");
baseline = rSession.collectDoubleArray("baseline");
- Release the session (closes the socket connection
for the session and ends the related Rserve slave process):
rSession.close(false);
Note: We should also check, at some point, if no
more tasks are requiring Rserve, and shutdown the server.
(it will be restarted the next time we use a
MZmine feature requiring R anyway...)
* Performance considerations:
--------------------------
Source:
http://www.sciencedomain.org/download.php?f=Satman4152014BJMCS10902_1.pdf&aid=4838&type=a
The performances where good enough for what I had to do with the
BaselineCorretors. So, at very first sight, this should fit the
other needs
(anyway, the other very few multi-threaded alternatives I gave a
test where not exploitable in term of evaluation speed).
May be we should/will have to consider switching between JRIengine
and Rserve for some particular features). I hope not, but in any case,
this is something which seems doable: as far as I tested, there is
absolutely no incompatibilities between Rserve and JRI.
To make your own opinion on all this, the BaselineCorretorModule
(again) allows to choose the R engine among RCaller (Online),
JRIengine and Rserve.
Just run the MZmine version linked on top of this email, and go to
"Raw data methods > Filtering > Baseline Correction"
Fill free to ask if something is unclear or if you have any
further questionings.
Looking forward to your constructive comments and thoughts...
I'll be glad to help, as much as possible, with migrating to
Rserve (if it turns out that this solution really is viable).
Cheers
Gauthier
-------- Forwarded Message --------
Subject: Re: [Mzmine-devel] Baseline correction
Date: Tue, 2 Dec 2014 05:15:41 +0000
From: Tomas Pluskal <plus...@oist.jp>
To: Gauthier Boaglio <gauthier.boag...@gmail.com>
Hi Gauthier,
Now that we released version 2.12, I think it is a good time to
consider making the switch from JRI to RServe.
I think it is a good idea - I don't really like the current JRI
interface.
It would be great if you could send a message to the devel list,
where you can summarize what you found about RServe. Especially,
we would like to know
1) is the initial setup and configuration going to be easier than
the current one (is it necessary to edit the startMZmine script in
order to use RServe?)
2) how does the actual code differ between RServe and JRI?
Thanks a lot!
Cheers,
Tomas
===============================================
Tomas Pluskal
G0 Cell Unit, Okinawa Institute of Science and Technology Graduate
University
1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
WWW: https://groups.oist.jp/g0
TEL: +81-98-966-8684
Fax: +81-98-966-2890
--
Gauthier BOAGLIO
CEFE - UMR 5175
1919 route de Mende
F-34293 Montpellier cedex 5
Tel: +33/0 4 67 61 32 15
Fax: +33/0 4 67 61 33 36
email: gauthier.boag...@cefe.cnrs.fr
www:
http://www.cefe.cnrs.fr/en/evolutionary-ecology-and-epidemiology/gauthier-boaglio
http://www.evolepid.org/people.php?name=boaglio
-------- Forwarded Message --------
Subject: R usage in MZmine - What about replacing JRIengine to
get multi-threading support?
Date: Tue, 02 Dec 2014 11:36:55 +0100
From: Gauthier Boaglio <gauthier.boag...@gmail.com>
To: mzmine-devel@lists.sourceforge.net
Hi everyone,
I begun to use Rserve <http://rforge.net/Rserve/>
(multi-threaded), instead of JRIengine (singleton /
mono-threaded), in my personal experimental branch
<https://sourceforge.net/p/mzmine/code/HEAD/tree/branches/gboaglio-experimental/>
and really think we could replace JRIengine (which does not take
advantage of parallel tasks processing, as you know), by the former.
I released a version using Rserve (/restricted to the "Baseline
Correction Module"/). If you want to give it a try, here it is:
https://sourceforge.net/p/mzmine/code/HEAD/tree/branches/gboaglio-experimental/target/MZmine-2.11-EEE-release-20141128.zip
<https://sourceforge.net/p/mzmine/code/HEAD/tree/branches/gboaglio-experimental/target/MZmine-2.11-EEE-release-20141126.zip>
Main points:
-----------
* Requirements:*
* ------------
*
No changes in "startMZmine" script* (we just need to locate "R_HOME",
as usual).
The only requirement is to have the "Rserve" R package installed. Then
it is ran through a simple R command, looking like:
library(Rserve);Rserve(debug=TRUE/FALSE,
args="--RS-enable-control")
# Where '--RS-enable-control' is used and mandatory to enable sending
"SIGKILL/SIGTERM" to the Rserve instances
# (when we need to abort a blocking 'eval' being performed)
(Note: we already had to install "rJava" for using JRI). So this does
not stand for a big additional constraint for the user.
=> The Rserve runnable (server app.) comes with the said R package
*Note:* I believe we could even simplify the*"startMZmine" workflow*,
by first trying to detect the R location automatically.
(the environment variables to be assigned in the script would
be used only as a fallback in case the detection failed)
That would more or less look like (See:RUtilities.java
<https://sourceforge.net/p/mzmine/code/HEAD/tree/branches/gboaglio-experimental/src/main/java/net/sf/mzmine/util/RUtilities.java>)
:
public static String getRexecutablePath() {
String osname = System.getProperty("os.name");
if (osname != null && osname.length() >= 7 &&
osname.substring(0,7).equals("Windows")) {
LOG.log(Level.INFO, "Windows: query registry
to find where R is installed ...");
String installPath = null;
try {
Process rp =
Runtime.getRuntime().exec("reg query HKLM\\Software\\R-core\\R");
StreamHog regHog = new
StreamHog(rp.getInputStream(), true);
rp.waitFor();
regHog.join();
installPath =
regHog.getInstallPath();
} catch (Exception rge) {
LOG.log(Level.SEVERE, "ERROR: unable
to run REG to find the location of R: "+rge);
return null;
}
if (installPath == null) {
LOG.log(Level.SEVERE, "ERROR: canot
find path to R. Make sure reg is available and R was installed with registry
settings.");
return null;
}
return installPath + "\\bin\\R.exe
<smb://bin//R.exe>";
}
File f = new
File("/Library/Frameworks/R.framework/Resources/bin/R");
if (f.exists()) return f.getPath();
f = new File("/usr/local/lib/R/bin/R");
if (f.exists()) return f.getPath();
f = new File("/usr/lib/R/bin/R");
if (f.exists()) return f.getPath();
f = new File("/sw/bin/R");
if (f.exists()) return f.getPath();
f = new File("/usr/common/bin/R");
if (f.exists()) return f.getPath();
f = new File("/opt/bin/R");
if (f.exists()) return f.getPath();
return null;
}
* The way it works:
----------------
Rserve is a server (implements a communication protocol with R) and, to
keep it simple, it should be run on localhost and default
port 6311 (this is something that should probably be mentioned to the
user, for*security and eventually firewall configuration reasons*).
Each time we create a new "RConnection", the main instance of Rserve
starts a new child process.
- We can store the PID of this instance for later termination.
- We can run asynchronously as many instances as we want (as long as we
take care to close/terminate the ones that are no longer used).
=> I already implemented a basic wrapper (that should be enhanced) for
those operations:
https://sourceforge.net/p/mzmine/code/HEAD/tree/branches/gboaglio-experimental/src/main/java/net/sf/mzmine/util/RSession.java
(Simplified version of the class in attachment).
=> The code for starting Rserve main server instance is located in
"RUtilities.java" for now:
https://sourceforge.net/p/mzmine/code/HEAD/tree/branches/gboaglio-experimental/src/main/java/net/sf/mzmine/util/RUtilities.java
=>*USAGE* (a sample use case can be found in theBaselineCorrectionTask.java
<https://sourceforge.net/p/mzmine/code/HEAD/tree/branches/gboaglio-experimental/src/main/java/net/sf/mzmine/modules/rawdatamethods/filtering/baselinecorrection/BaselineCorrectionTask.java>):
Basically, evaluating one or more R commands is done as follows:
- Create and open a new session:
// (This will automatically start Rserve, if
not already running, and open a new slave instance for this task)
String[] reqPackages = new String[] { "ptw" };
RSession rSession = new RSession(reqPackages);
rSession.open();
- Check the additional packages necessary to perform
the commands for the given session:
// Returns the first failing required package
name or null if successful
String missingPackage =
rSession.loadRequiredPackages();
- Do some R evaluations, such as:
// Set chromatogram.
rSession.assignDoubleArray("chromatogram",
chromatogram);
// Calculate baseline.
rSession.eval("baseline <- asysm(chromatogram," + smoothing +
"," + asymmetry + ")");
baseline =
rSession.collectDoubleArray("baseline");
- Release the session (closes the socket connection for
the session and ends the related Rserve slave process):
rSession.close(false);
*Note:* We should also check, at some point,
if no more tasks are requiring Rserve, and shutdown the server.
(it will be restarted the next time we
use a MZmine feature requiring R anyway...)
* Performance considerations:
--------------------------
<Mail Attachment.png>
Source:http://www.sciencedomain.org/download.php?f=Satman4152014BJMCS10902_1.pdf&aid=4838&type=a
The performances where good enough for what I had to do with the
BaselineCorretors. So, at very first sight, this should fit the other needs
(anyway, the other very few multi-threaded alternatives I gave a test where not
exploitable in term of evaluation speed).
May be we should/will have to consider switching between JRIengine and Rserve
for some particular features).*I hope not*, but in any case,
this is something which seems doable: as far as I tested, there is absolutely
no incompatibilities between Rserve and JRI.
To make your own opinion on all this, the BaselineCorretorModule (again) allows
to choose the R engine among*RCaller (Online), JRIengine and Rserve*.
Just run the MZmine version linked on top of this email, and go to "Raw data methods >
Filtering > Baseline Correction"
<Mail Attachment.png>
Fill free to ask if something is unclear or if you have any
further questionings.
Looking forward to your constructive comments and thoughts...
I'll be glad to help, as much as possible, with migrating to
Rserve (if it turns out that this solution really is viable).
Cheers
Gauthier
-------- Forwarded Message --------
Subject: Re: [Mzmine-devel] Baseline correction
Date: Tue, 2 Dec 2014 05:15:41 +0000
From: Tomas Pluskal <plus...@oist.jp>
To: Gauthier Boaglio <gauthier.boag...@gmail.com>
Hi Gauthier,
Now that we released version 2.12, I think it is a good time to
consider making the switch from JRI to RServe.
I think it is a good idea - I don't really like the current JRI
interface.
It would be great if you could send a message to the devel list,
where you can summarize what you found about RServe. Especially,
we would like to know
1) is the initial setup and configuration going to be easier than
the current one (is it necessary to edit the startMZmine script in
order to use RServe?)
2) how does the actual code differ between RServe and JRI?
Thanks a lot!
Cheers,
Tomas
===============================================
Tomas Pluskal
G0 Cell Unit, Okinawa Institute of Science and Technology Graduate
University
1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
WWW: https://groups.oist.jp/g0
TEL: +81-98-966-8684
Fax: +81-98-966-2890
--
Gauthier BOAGLIO
CEFE - UMR 5175
1919 route de Mende
F-34293 Montpellier cedex 5
Tel: +33/0 4 67 61 32 15
Fax: +33/0 4 67 61 33 36
email:gauthier.boag...@cefe.cnrs.fr
www:http://www.cefe.cnrs.fr/en/evolutionary-ecology-and-epidemiology/gauthier-boaglio
http://www.evolepid.org/people.php?name=boaglio