Thanks, Mark. Hiram had suggested dropping -z from the rsynch command, and we're trying that and so far it's working. As of yesterday we were up to 450 GB, whereas earlier we'd been stuck at 300 GB. So we're hopeful.
> -----Original Message----- > From: Mark Diekhans [mailto:[email protected]] > Sent: Monday, April 25, 2011 11:01 AM > To: Martin Tompa > Cc: Hiram Clawson > Subject: RE: [Genome] possibly excessive MySQL queries > > > Sigh, I figured it wasn't that easy. > > So random things I would probably try: > > -vv - more verbosity, maybe we can see what is going on > --whole-file - which doesn't try the delta algorithm, which trades off > network transfer for CPU time. > > - Download the txt.gz and .sql files and do an mysqlimport. > > mark > > Martin Tompa <[email protected]> writes: > > Thanks. Yes, there's a terabyte available. > > > > > -----Original Message----- > > > From: Mark Diekhans [mailto:[email protected]] > > > Sent: Monday, April 25, 2011 9:26 AM > > > To: Martin Tompa > > > Cc: Hiram Clawson > > > Subject: Re: [Genome] possibly excessive MySQL queries > > > > > > > > > Hi Martin, > > > > > > May seem silly, but it's bitten me a time or two; have > you checked > > > there is enough free disk space? The rsync error message doesn't > > > indicate this is the problem, however it's one more think > to check. > > > > > > mark > > > > > > Martin Tompa <[email protected]> writes: > > > > Thanks for your quick reply, Hiram. This seems not to be > > > the problem, as 2 of the largest files on your list are > already in > > > our mirror: > > > > > > > > [tompa@amlia hg18]$ ls -l gbCdnaInfo.* > > > > -rw-rw-r-- 1 mysql mysql 9481 Mar 25 16:41 gbCdnaInfo.frm > > > > -rw-rw-r-- 1 mysql mysql 6247023745 Apr 14 18:50 gbCdnaInfo.MYD > > > > -rw-rw-r-- 1 mysql mysql 12615000064 Apr 14 18:57 gbCdnaInfo.MYI > > > > > > > > We're running rsync 3.0.8, which lists 64-bit capabilities. > > > The file system is ext4, which is supposed to top out in the > > > terabytes. The OS is 32-bit (F13 i386). > > > > > > > > Martin. > > > > > > > > > -----Original Message----- > > > > > From: Hiram Clawson [mailto:[email protected]] > > > > > Sent: Saturday, April 23, 2011 7:51 AM > > > > > To: Martin Tompa > > > > > Cc: [email protected]; Maximilian Haussler > > > > > Subject: Re: [Genome] possibly excessive MySQL queries > > > > > > > > > > Check to see if the largest files are successfully there: > > > > > > > > > > -rw-rw-r-- 1 12615000064 Apr 14 18:57 gbCdnaInfo.MYI > > > > > -rw-rw-r-- 1 6394012539 Apr 14 18:50 gbStatus.MYD > > > > > -rw-rw-r-- 1 6247023745 Apr 14 18:50 gbCdnaInfo.MYD > > > > > -rw-rw-r-- 2 5845521612 Jan 22 2010 > > > > > wgEncodeCaltechRnaSeqPairedRep2Hepg2CellPapErng32aR2x75.MYD > > > > > -rw-rw-r-- 2 4875967348 Jan 12 2010 > > > > > wgEncodeCaltechRnaSeqPairedRep2Helas3CellPapErng32aR2x75.MYD > > > > > -rw-rw-r-- 1 4866136092 Apr 9 02:21 xenoEst.MYD > > > > > -rw-rw-r-- 2 4723110856 Jan 13 2010 > > > > > wgEncodeCaltechRnaSeqPairedRep1HuvecCellPapErng32aR2x75.MYD > > > > > -rw-rw-r-- 2 4437731328 Jan 14 2010 > > > > > wgEncodeCaltechRnaSeqPairedRep4H1hescCellPapErng32aR2x75.MYD > > > > > -rw-rw-r-- 2 4425865592 Jan 12 2010 > > > > > wgEncodeCaltechRnaSeqPairedRep1NhekCellPapErng32aR2x75.MYD > > > > > -rw-rw-r-- 1 4188932096 Apr 14 18:57 gbSeq.MYI > > > > > > > > > > Perhaps you have file system limitations on large files. > > > > > Is your rsync fully 64 bit enabled ? > > > > > > > > > > $ rsync -v > > > > > rsync version 3.0.7 protocol version 30 Copyright (C) > > > > > 1996-2009 by Andrew Tridgell, Wayne Davison, and others. > > > > > Web site: http://rsync.samba.org/ > > > > > Capabilities: > > > > > 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit > > > long ints, > > > > > socketpairs, hardlinks, symlinks, IPv6, > batchfiles, inplace, > > > > > append, ACLs, xattrs, iconv, no symtimes > > > > > > > > > > --Hiram > > > > > > > > > > ----- Original Message ----- > > > > > From: "Martin Tompa" <[email protected]> > > > > > To: "Maximilian Haussler" <[email protected]>, "Hiram > > > Clawson" > > > > > <[email protected]> > > > > > Cc: [email protected], "Martin Tompa" > > > > > <[email protected]> > > > > > Sent: Friday, April 22, 2011 10:13:49 PM > > > > > Subject: RE: [Genome] possibly excessive MySQL queries > > > > > > > > > > Dear friends at UCSC, > > > > > > > > > > Earlier this month your helpful replies guided us to the > > > decision to > > > > > use rsynch to mirror the UCSC hg18 and hg19 mysql > > > database locally, > > > > > for use in my undergraduate capstone class this quarter. > > > The rsynch > > > > > of hg19 took a couple of days, but went without a > hitch. We are > > > > > successfully using our mirror of that database now. But > > > the rsynch > > > > > of hg18 has failed repeatedly. Here is the message from > > > the mysql > > > > > expert on our support staff who was doing both hg18 and > > > hg19 for us: > > > > > > > > > > > For whatever reason, the rsync of hg18 keeps failing-- > > > only 300GB > > > > > > having transferred. Seems to occur at or near the same > > > point each > > > > > > time. > > > > > > > > > > > > rsync: connection unexpectedly closed (7559869260 bytes > > > received > > > > > > so > > > > > > far) [receiver] rsync error: error in rsync protocol data > > > > > stream (code > > > > > > 12) at io.c(601) [receiver=3.0.8] > > > > > > rsync: connection unexpectedly closed (787372 bytes > > > > > received so far) > > > > > > [generator] rsync error: error in rsync protocol data > > > > > stream (code 12) > > > > > > at io.c(601) [generator=3.0.8] > > > > > > > > > > > > I've tried half a dozen times at least. What next? > > > > > > > > > > Do you have ideas of what might be going wrong? > > > Suggestions of what > > > > > to try? > > > > > > > > > > Thanks. > > > > > Martin. > > > > > > > > > > > -----Original Message----- > > > > > > From: Maximilian Haussler [mailto:[email protected]] > > > > > > Sent: Tuesday, April 05, 2011 2:41 PM > > > > > > To: Hiram Clawson > > > > > > Cc: Martin Tompa; [email protected] > > > > > > Subject: Re: [Genome] possibly excessive MySQL queries > > > > > > > > > > > > Hi Martin, > > > > > > > > > > > > we've had a similar question on Biostar recently and the > > > > > > person finally found it easier to mirror the UCSC mysql > > > database than to > > > > > > bother with remote access. If you already have a mysql > > > > > server running > > > > > > somewhere, mirroring the ucsc database for e.g. hg18 > > > > > requires only one > > > > > > single rsync command. > > > > > > > > > > > > Given that you don't want to risk that the mysql access to > > > > > > ucsc directly gets blocked during the course or > just 2 hours > > > > > > before they have to hand in their exercises (which > is likely, > > > > > > because they will all start 2 hours before the > deadline :-), > > > > > > the best > > > > > solution could be > > > > > > a local mirror of the database (not the genome browser > > > > > website, only > > > > > > the mysql database itself). > > > > > > > > > > > > The biostar thread contains the required command: > > > > > > > > > > > > > > > http://biostar.stackexchange.com/questions/4552/getting-ucsc-data-vi > > > > > a- > > > > > > mysql/4554#4554 > > > > > > > > > > > > hope this helps > > > > > > cheers > > > > > > Max > > > > > > -- > > > > > > Maximilian Haussler > > > > > > Office:+44 161 27 55980 Mob: +44 7574 246 789 > > > > > > http://www.manchester.ac.uk/research/maximilian.haussler/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 5, 2011 at 10:51 PM, Hiram Clawson > > > > > > <[email protected]> > > > > > > wrote: > > > > > > > > > > > > > > You could also use the sql definition text files from > > > hgdownload. > > > > > > > > http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/*.sq > > > > > > > l > > > > > > > > > > > > > > also available via FTP and rsync. > > > > > > > > > > > > > > You could rsync all of these .sql files to a local > > > > > > directory and allow > > > > > > > everyone to use local files. > > > > > > > > > > > > > > If you want to run MySQL exercises, you should use small > > > > > > samples from > > > > > > > small tables. Running exercises over an entire > database is > > > > > > an immense > > > > > > > amount of work. There are several hundred Gb of > data in hg19. > > > > > > > > > > > > > > --Hiram > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "robert kuhn" <[email protected]> > > > > > > > To: "Martin Tompa" <[email protected]> > > > > > > > Cc: "[email protected]" <[email protected]> > > > > > > > Sent: Tuesday, April 5, 2011 1:43:02 PM > > > > > > > Subject: Re: [Genome] possibly excessive MySQL queries > > > > > > > > > > > > > > Hi, Martin, > > > > > > > > > > > > > > thanks for asking. That might add up to an awful lot of > > > > > queries if > > > > > > > you are using a human assembly. there are 1000s of > > > > > tables in there. > > > > > > > You might consider parsing the trackDb table first, > > > because the > > > > > > > entries _______________________________________________ > > > > > > > Genome maillist - [email protected] > > > > > > > https://lists.soe.ucsc.edu/mailman/listinfo/genome > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Genome maillist - [email protected] > > > > https://lists.soe.ucsc.edu/mailman/listinfo/genome > > > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
