Thanks to Martin who send an email off the list with among others the following:

"Probably the file is being corrupted on disk, perhaps it has not yet been closed before reading is attempted, or some other obscure file system issue. Probably the key part in your script is 'sleep', which probably slows disk access enough for your file system to recover integrity."

His note made me think that something can be with the programs running in parallel in the same processing server:

There are up to 8 slots for running in parallel 8 jobs in a Linux server. Many servers are available. Each job is working with unique file names for R and the corresponding out files, and also all the objects inside the each R job are defined unique with their own indices, and I finish the program with q(); n for not saving the R space at the end of each process.

Let me draw a parallel thinking with SAS jobs. If I run a 8 parallel job in SAS, SAS although it will use the /tmp directory of that processing server, each job will have its own pid and they are built unique in their run and uniquely saving temp data and removed at the end. So 8 parallel jobs in a server and more from different servers, they do not corrupt each others data.

Now what happens with R? Eight jobs are in parallel, are they processed in unique spaces of the /tmp harddrive, or all write to ~/.RData ? If the last happens although they are uniquely defined, it is quite possible that in the ~/.RData something is happening with reported error:

Error: ReadItem: unknown type 98, perhaps written by later version of R
Execution halted

Probably --no-restore --no-save may help, but isn't that dangerous if all programs (if I have 1000 of them) write all to ~/.RData? So how R handles parallel jobs of the same user in regard to the R invocation and space used for temporary calculations. Do these parallel batch R jobs see each other in the same space or are they for sure in independent temporary subdirs?

Thanks,

Aldi

On 8/22/2012 3:47 PM, Aldi Kraja wrote:
Hi,

Here is a solution for this type of error:
Error: ReadItem: unknown type 98, perhaps written by later version of R
Execution halted

Created a script file under the directory where the pgm-s and data reside and ran there

./script.sh

where script.sh had the following lines
R CMD BATCH ./dc19at1.R ./dc19at1.out
sleep 3
R CMD BATCH ./dc19at2.R ./dc19at2.out
sleep 3
...
etc

The programs ran with no problem.

So what I did is eliminated the full path let's say
R CMD BATCH /a/b/c/dc19at1.R /a/b/c/dc19at1.out
which did not work through bsub or at the command line in a remote server.

I am not sure what is the "type 98 error" meaning in R?
Anybody knows where the R error types are described?

TIA,

Aldi

On 8/21/2012 10:09 AM, Aldi Kraja wrote:
Hi,

I am running a large number of jobs (thousands) in parallel (linux OS 64bit), R version 2.14.1 (2011-12-22), Platform: x86_64-redhat-linux-gnu (64-bit). Up to yesterday everything ran fine with jobs in several blocks (block1, block2 etc) of submission. They are sent to an LSF platform to handle the parallel submission. Today I see that only one of the blocks (the 19) has not finished correct:
It reports in the out file:

Error: ReadItem: unknown type 98, perhaps written by later version of R
Execution halted

Checking through google one had recommended rm ~/.RData
I applied it, but the run again fails, when submitting through SAS for block 19.

[SAS in macro lang.] %sysexec bsub R CMD BATCH &fullpath./dc19at&j..R &fullpath.dc19at&j..out ;
[SAS ] %sysexec sleep 3 ;
  <looping through jobs in a block>

If I go to the directory where the R program and the data reside and apply the same command by hand

R CMD BATCH dc19at1.R dc19at1.out
it works with no problem.

But if I use a similar program (SAS program)

that has been executing the same command successfully for thousand of jobs in other blocks, the jobs for the block 19 fail.

Error: ReadItem: unknown type 98, perhaps written by later version of R
Execution halted

even in the one I just mentioned if I execute by hand goes well.

Do you know what could be the cause of bsub submission to fail? Any remedy?

Thank you in advance,

Aldi

--

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to