Hi  Gordon,

Thanks for your assistance and the recommendations. Freezing postgres sounds like hell to me :-)

abrt was filling the root directory indeed. So disabled it.

I have done some exporting tests, and the behaviour is not consistent.

1. *size*: in general, it worked out for smaller datasets, and usually crashed on bigger ones (starting from 3 GB). So size is key? 2. But now I have found several histories of 4.5GB that I was able to export... So far for the size hypothesis.

Another observation: when the export crashes, the corresponding webhandler process dies.

So now I suspect something to be wrong with the datasets, but I am not able to trace something meaningful in the logs. I am not confident in turning on logging in Python yet, but apparently this happens with the module "logging" initiated like logging.getLogger( __name__ ).


Cheers,
Joachim

Joachim Jacob

Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
http://www.bits.vib.be
@bitsatvib

On 03/25/2013 05:18 PM, Assaf Gordon wrote:
Hello Joachim,

Couple of things to check:

On Mar 25, 2013, at 10:01 AM, Joachim Jacob | VIB | wrote:

Hi,

About the exporting of history, which fails:
1. the preparation seems to work fine: meaning: choosing 'Export this history' 
in the History menu leads to a URL that reports initially that the export is 
still in progress.

2. when the export is finished, and I click the download link, the  root partition fills 
and the browser displays "Error reading from remote server". A folder 
ccpp-2013-03-25-14:51:15-27045.new is created in the directory /var/spool/abrt, which 
fills the root partition.
Something in your export is likely not finishing fine, but crashes instead 
(either the creation of the archive, or the download).

The folder "/var/spool/abrt/ccpp-XXXX" (and especially a file named "coredump") 
hints that the program crashed.
"abrt" is a daemon (at least on Fedora) that monitors crashes and tries to keep 
all relevant information about the program which crashed 
(http://docs.fedoraproject.org/en-US/Fedora/13/html/Deployment_Guide/ch-abrt.html).

So what might have happened, is that a program (galaxy's export_history.py or other) 
crashed during your export, and then "abrt" picked-up the pieces (storing a 
memory dump, for example), and then filled your disk.

The handler reports in its log:
"""
galaxy.jobs DEBUG 2013-03-25 14:38:33,322 (8318) Working directory for job is: 
/mnt/galaxydb/job_working_directory/008/8318
galaxy.jobs.handler DEBUG 2013-03-25 14:38:33,322 dispatching job 8318 to local 
runner
galaxy.jobs.handler INFO 2013-03-25 14:38:33,368 (8318) Job dispatched
galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,432 Local runner: starting 
job 8318
galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,572 executing: python 
/home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G 
/mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF 
/mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
galaxy.jobs.runners.local DEBUG 2013-03-25 14:41:29,420 execution finished: 
python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G 
/mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF 
/mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
galaxy.jobs DEBUG 2013-03-25 14:41:29,476 Tool did not define exit code or 
stdio handling; checking stderr for success
galaxy.tools DEBUG 2013-03-25 14:41:29,530 Error opening galaxy.json file: 
[Errno 2] No such file or directory: 
'/mnt/galaxydb/job_working_directory/008/8318/galaxy.json'
galaxy.jobs DEBUG 2013-03-25 14:41:29,555 job 8318 ended
"""

The system reports:
"""
Mar 25 14:51:26 galaxy abrt[16805]: Write error: No space left on device
Mar 25 14:51:27 galaxy abrt[16805]: Error writing 
'/var/spool/abrt/ccpp-2013-03-25-14:51:15-27045.new/coredump'
"""

One thing to try: if you have galaxy keeping temporary files, try running the 
"export" command manually:
===
python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G 
/mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF 
/mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
===

Another thing to try: modify "export_history.py", adding debug messages to 
track progress and whether it finishes or not.

And: check the "abrt" program's GUI, perhaps you'll see previous crashes that 
were stored successfully, providing more information about which program crashed.


As a general rule, it's best to keep the "/var" directory on a separate 
partition for production systems, exactly so that filling it up with junk wouldn't 
intervene with other programs.
Even better, set each sub-directory of "/var" to a dedicated partition, so that filling up 
"/var/log" or "/var/spool" would not fill up "/var/lib/pgsql" and stop Postgres from 
working.


-gordon





___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

Reply via email to