Thanks for your quick reply, Hiram. This seems not to be the problem, as 2 of the largest files on your list are already in our mirror:
[tompa@amlia hg18]$ ls -l gbCdnaInfo.* -rw-rw-r-- 1 mysql mysql 9481 Mar 25 16:41 gbCdnaInfo.frm -rw-rw-r-- 1 mysql mysql 6247023745 Apr 14 18:50 gbCdnaInfo.MYD -rw-rw-r-- 1 mysql mysql 12615000064 Apr 14 18:57 gbCdnaInfo.MYI We're running rsync 3.0.8, which lists 64-bit capabilities. The file system is ext4, which is supposed to top out in the terabytes. The OS is 32-bit (F13 i386). Martin. > -----Original Message----- > From: Hiram Clawson [mailto:[email protected]] > Sent: Saturday, April 23, 2011 7:51 AM > To: Martin Tompa > Cc: [email protected]; Maximilian Haussler > Subject: Re: [Genome] possibly excessive MySQL queries > > Check to see if the largest files are successfully there: > > -rw-rw-r-- 1 12615000064 Apr 14 18:57 gbCdnaInfo.MYI > -rw-rw-r-- 1 6394012539 Apr 14 18:50 gbStatus.MYD > -rw-rw-r-- 1 6247023745 Apr 14 18:50 gbCdnaInfo.MYD > -rw-rw-r-- 2 5845521612 Jan 22 2010 > wgEncodeCaltechRnaSeqPairedRep2Hepg2CellPapErng32aR2x75.MYD > -rw-rw-r-- 2 4875967348 Jan 12 2010 > wgEncodeCaltechRnaSeqPairedRep2Helas3CellPapErng32aR2x75.MYD > -rw-rw-r-- 1 4866136092 Apr 9 02:21 xenoEst.MYD > -rw-rw-r-- 2 4723110856 Jan 13 2010 > wgEncodeCaltechRnaSeqPairedRep1HuvecCellPapErng32aR2x75.MYD > -rw-rw-r-- 2 4437731328 Jan 14 2010 > wgEncodeCaltechRnaSeqPairedRep4H1hescCellPapErng32aR2x75.MYD > -rw-rw-r-- 2 4425865592 Jan 12 2010 > wgEncodeCaltechRnaSeqPairedRep1NhekCellPapErng32aR2x75.MYD > -rw-rw-r-- 1 4188932096 Apr 14 18:57 gbSeq.MYI > > Perhaps you have file system limitations on large files. > Is your rsync fully 64 bit enabled ? > > $ rsync -v > rsync version 3.0.7 protocol version 30 Copyright (C) > 1996-2009 by Andrew Tridgell, Wayne Davison, and others. > Web site: http://rsync.samba.org/ > Capabilities: > 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, > socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, > append, ACLs, xattrs, iconv, no symtimes > > --Hiram > > ----- Original Message ----- > From: "Martin Tompa" <[email protected]> > To: "Maximilian Haussler" <[email protected]>, "Hiram > Clawson" <[email protected]> > Cc: [email protected], "Martin Tompa" <[email protected]> > Sent: Friday, April 22, 2011 10:13:49 PM > Subject: RE: [Genome] possibly excessive MySQL queries > > Dear friends at UCSC, > > Earlier this month your helpful replies guided us to the > decision to use rsynch to mirror the UCSC hg18 and hg19 mysql > database locally, for use in my undergraduate capstone class > this quarter. The rsynch of hg19 took a couple of days, but > went without a hitch. We are successfully using our mirror > of that database now. But the rsynch of hg18 has failed > repeatedly. Here is the message from the mysql expert on our > support staff who was doing both hg18 and hg19 for us: > > > For whatever reason, the rsync of hg18 keeps failing-- only 300GB > > having transferred. Seems to occur at or near the same point each > > time. > > > > rsync: connection unexpectedly closed (7559869260 bytes received so > > far) [receiver] rsync error: error in rsync protocol data > stream (code > > 12) at io.c(601) [receiver=3.0.8] > > rsync: connection unexpectedly closed (787372 bytes > received so far) > > [generator] rsync error: error in rsync protocol data > stream (code 12) > > at io.c(601) [generator=3.0.8] > > > > I've tried half a dozen times at least. What next? > > Do you have ideas of what might be going wrong? Suggestions > of what to try? > > Thanks. > Martin. > > > -----Original Message----- > > From: Maximilian Haussler [mailto:[email protected]] > > Sent: Tuesday, April 05, 2011 2:41 PM > > To: Hiram Clawson > > Cc: Martin Tompa; [email protected] > > Subject: Re: [Genome] possibly excessive MySQL queries > > > > Hi Martin, > > > > we've had a similar question on Biostar recently and the person > > finally found it easier to mirror the UCSC mysql database than to > > bother with remote access. If you already have a mysql > server running > > somewhere, mirroring the ucsc database for e.g. hg18 > requires only one > > single rsync command. > > > > Given that you don't want to risk that the mysql access to ucsc > > directly gets blocked during the course or just 2 hours before they > > have to hand in their exercises (which is likely, because they will > > all start 2 hours before the deadline :-), the best > solution could be > > a local mirror of the database (not the genome browser > website, only > > the mysql database itself). > > > > The biostar thread contains the required command: > > > http://biostar.stackexchange.com/questions/4552/getting-ucsc-data-via- > > mysql/4554#4554 > > > > hope this helps > > cheers > > Max > > -- > > Maximilian Haussler > > Office:+44 161 27 55980 Mob: +44 7574 246 789 > > http://www.manchester.ac.uk/research/maximilian.haussler/ > > > > > > > > > > On Tue, Apr 5, 2011 at 10:51 PM, Hiram Clawson <[email protected]> > > wrote: > > > > > > You could also use the sql definition text files from hgdownload. > > > http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/*.sql > > > > > > also available via FTP and rsync. > > > > > > You could rsync all of these .sql files to a local > > directory and allow > > > everyone to use local files. > > > > > > If you want to run MySQL exercises, you should use small > > samples from > > > small tables. Running exercises over an entire database is > > an immense > > > amount of work. There are several hundred Gb of data in hg19. > > > > > > --Hiram > > > > > > ----- Original Message ----- > > > From: "robert kuhn" <[email protected]> > > > To: "Martin Tompa" <[email protected]> > > > Cc: "[email protected]" <[email protected]> > > > Sent: Tuesday, April 5, 2011 1:43:02 PM > > > Subject: Re: [Genome] possibly excessive MySQL queries > > > > > > Hi, Martin, > > > > > > thanks for asking. That might add up to an awful lot of > queries if > > > you are using a human assembly. there are 1000s of > tables in there. > > > You might consider parsing the trackDb table first, because the > > > entries _______________________________________________ > > > Genome maillist - [email protected] > > > https://lists.soe.ucsc.edu/mailman/listinfo/genome > > > > > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
