Thanks for your quick reply, Hiram.  This seems not to be the problem, as 2 of 
the largest files on your list are already in our mirror:

[tompa@amlia hg18]$ ls -l gbCdnaInfo.*
-rw-rw-r-- 1 mysql mysql        9481 Mar 25 16:41 gbCdnaInfo.frm
-rw-rw-r-- 1 mysql mysql  6247023745 Apr 14 18:50 gbCdnaInfo.MYD
-rw-rw-r-- 1 mysql mysql 12615000064 Apr 14 18:57 gbCdnaInfo.MYI
 
We're running rsync 3.0.8, which lists 64-bit capabilities. The file system is 
ext4, which is supposed to top out in the terabytes. The OS is 32-bit (F13 
i386). 

Martin.

> -----Original Message-----
> From: Hiram Clawson [mailto:[email protected]] 
> Sent: Saturday, April 23, 2011 7:51 AM
> To: Martin Tompa
> Cc: [email protected]; Maximilian Haussler
> Subject: Re: [Genome] possibly excessive MySQL queries
> 
> Check to see if the largest files are successfully there:
> 
> -rw-rw-r-- 1 12615000064 Apr 14 18:57 gbCdnaInfo.MYI
> -rw-rw-r-- 1  6394012539 Apr 14 18:50 gbStatus.MYD
> -rw-rw-r-- 1  6247023745 Apr 14 18:50 gbCdnaInfo.MYD
> -rw-rw-r-- 2  5845521612 Jan 22  2010 
> wgEncodeCaltechRnaSeqPairedRep2Hepg2CellPapErng32aR2x75.MYD
> -rw-rw-r-- 2  4875967348 Jan 12  2010 
> wgEncodeCaltechRnaSeqPairedRep2Helas3CellPapErng32aR2x75.MYD
> -rw-rw-r-- 1  4866136092 Apr  9 02:21 xenoEst.MYD
> -rw-rw-r-- 2  4723110856 Jan 13  2010 
> wgEncodeCaltechRnaSeqPairedRep1HuvecCellPapErng32aR2x75.MYD
> -rw-rw-r-- 2  4437731328 Jan 14  2010 
> wgEncodeCaltechRnaSeqPairedRep4H1hescCellPapErng32aR2x75.MYD
> -rw-rw-r-- 2  4425865592 Jan 12  2010 
> wgEncodeCaltechRnaSeqPairedRep1NhekCellPapErng32aR2x75.MYD
> -rw-rw-r-- 1  4188932096 Apr 14 18:57 gbSeq.MYI
> 
> Perhaps you have file system limitations on large files.
> Is your rsync fully 64 bit enabled ?
> 
> $ rsync -v
> rsync  version 3.0.7  protocol version 30 Copyright (C) 
> 1996-2009 by Andrew Tridgell, Wayne Davison, and others.
> Web site: http://rsync.samba.org/
> Capabilities:
>     64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
>     socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
>     append, ACLs, xattrs, iconv, no symtimes
> 
> --Hiram
> 
> ----- Original Message -----
> From: "Martin Tompa" <[email protected]>
> To: "Maximilian Haussler" <[email protected]>, "Hiram 
> Clawson" <[email protected]>
> Cc: [email protected], "Martin Tompa" <[email protected]>
> Sent: Friday, April 22, 2011 10:13:49 PM
> Subject: RE: [Genome] possibly excessive MySQL queries
> 
> Dear friends at UCSC,
> 
> Earlier this month your helpful replies guided us to the 
> decision to use rsynch to mirror the UCSC hg18 and hg19 mysql 
> database locally, for use in my undergraduate capstone class 
> this quarter.  The rsynch of hg19 took a couple of days, but 
> went without a hitch.  We are successfully using our mirror 
> of that database now.  But the rsynch of hg18 has failed 
> repeatedly.  Here is the message from the mysql expert on our 
> support staff who was doing both hg18 and hg19 for us:
> 
> > For whatever reason, the rsync of hg18 keeps failing-- only 300GB 
> > having transferred. Seems to occur at or near the same point each 
> > time.
> > 
> > rsync: connection unexpectedly closed (7559869260 bytes received so
> > far) [receiver] rsync error: error in rsync protocol data 
> stream (code
> > 12) at io.c(601) [receiver=3.0.8]
> > rsync: connection unexpectedly closed (787372 bytes 
> received so far) 
> > [generator] rsync error: error in rsync protocol data 
> stream (code 12) 
> > at io.c(601) [generator=3.0.8]
> > 
> > I've tried half a dozen times at least. What next? 
> 
> Do you have ideas of what might be going wrong?  Suggestions 
> of what to try?
> 
> Thanks.
> Martin.
> 
> > -----Original Message-----
> > From: Maximilian Haussler [mailto:[email protected]]
> > Sent: Tuesday, April 05, 2011 2:41 PM
> > To: Hiram Clawson
> > Cc: Martin Tompa; [email protected]
> > Subject: Re: [Genome] possibly excessive MySQL queries
> > 
> > Hi Martin,
> > 
> > we've had a similar question on Biostar recently and the person 
> > finally found it easier to mirror the UCSC mysql database than to 
> > bother with remote access. If you already have a mysql 
> server running 
> > somewhere, mirroring the ucsc database for e.g. hg18 
> requires only one 
> > single rsync command.
> > 
> > Given that you don't want to risk that the mysql access to ucsc 
> > directly gets blocked during the course or just 2 hours before they 
> > have to hand in their exercises (which is likely, because they will 
> > all start 2 hours before the deadline :-), the best 
> solution could be 
> > a local mirror of the database (not the genome browser 
> website, only 
> > the mysql database itself).
> > 
> > The biostar thread contains the required command:
> > 
> http://biostar.stackexchange.com/questions/4552/getting-ucsc-data-via-
> > mysql/4554#4554
> > 
> > hope this helps
> > cheers
> > Max
> > --
> > Maximilian Haussler
> > Office:+44 161 27 55980 Mob: +44 7574 246 789 
> > http://www.manchester.ac.uk/research/maximilian.haussler/
> > 
> > 
> > 
> > 
> > On Tue, Apr 5, 2011 at 10:51 PM, Hiram Clawson <[email protected]> 
> > wrote:
> > >
> > > You could also use the sql definition text files from hgdownload.
> > > http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/*.sql
> > >
> > > also available via FTP and rsync.
> > >
> > > You could rsync all of these .sql files to a local
> > directory and allow
> > > everyone to use local files.
> > >
> > > If you want to run MySQL exercises, you should use small
> > samples from
> > > small tables.  Running exercises over an entire database is
> > an immense
> > > amount of work.  There are several hundred Gb of data in hg19.
> > >
> > > --Hiram
> > >
> > > ----- Original Message -----
> > > From: "robert kuhn" <[email protected]>
> > > To: "Martin Tompa" <[email protected]>
> > > Cc: "[email protected]" <[email protected]>
> > > Sent: Tuesday, April 5, 2011 1:43:02 PM
> > > Subject: Re: [Genome] possibly excessive MySQL queries
> > >
> > > Hi, Martin,
> > >
> > > thanks for asking.  That might add up to an awful lot of 
> queries if 
> > > you are using a human assembly.  there are 1000s of 
> tables in there.
> > > You might consider parsing the trackDb table first, because the 
> > > entries _______________________________________________
> > > Genome maillist  -  [email protected] 
> > > https://lists.soe.ucsc.edu/mailman/listinfo/genome
> > >
> >
> 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to