My apologies for a tardy reply. I'll address all of the questions in this e-mail, rather than reply multiple times.
1) I used file to determine that the .exe files were 32bit. It is entirely possible that file returns 32bit for all .exe, rather than examine the file. 2) Is there a way to store char/string data as something smaller than UTF-16? The data are SNP genotypes, i.e. a single SNP genotype looks like A T and there are almost a million of these per individual. I'm thinking that what I need to do is record the genotype as bits, i.e. 0 or 1, and relate that back to a translation class thet returns A or T when that SNP is queried. It would be simpler if I could store char/string data as something reasonably small. 3) What I'm currently doing is: a) read in each line as a single string which is split based upon whitespace b) input each SNP into a class which is stored in an ArrayList, or as a string array in a List<string> (I've implemented it both ways) c) once the while file is read in, output each collection of SNPs by chromosome to a different file for processing by other software I've been able to get past my initial problem by re-compiling mono with the large heap size GC and when the entire data is read in, it takes up 17GB RAM for a 300MB file. I know I'm new to mono/C#, but I've been programming in C++ for years and have written many commerical applications for large data and nothing I've written to date has been as memory hungry as this. I'm hoepful I can get some good suggestions on how to improve performance. Thanks!! Dave H ----- Original Message ---- From: Jonathan Pryor <[email protected]> To: dnadavewa <[email protected]> Cc: [email protected] Sent: Friday, April 24, 2009 12:14:12 PM Subject: Re: [Mono-list] 64bit gmcs/mcs in SLES/openSuSE rpms? On Thu, 2009-04-23 at 14:20 -0700, dnadavewa wrote: > I'm working on a large data problem where I'm reading in data from text files > with almost 2 million columns. In doing this, I can read in about 25 rows > before Mono bombs out with an out of memory error. How are you reading in these lines? > What I found was the mono executable was indeed 64 bit, but gmcs.exe and > mcs.exe were 32 bit. As Chris Howie mentioned, these are actually in platform-neutral IL, and will be run using a 64-bit address space when using `mono`. > One other point, memory usage is horrible. I admit that I'm new to C# and > mono, so my coding skills are not as good as others, but a 300MB file should > not use 2GB RAM to read in 1/8 of the file. That depends ~entirely on how you're reading in the file. Also keep in mind that .NET strings are UTF-16, so if your input text is ASCII, you will require twice as much RAM as the size of the file, e.g. 600MB of RAM to store the entire file as a string. (Then there is various object overhead considerations, but these are likely tiny compared to the 300MB you're looking at.) > I stopped using classes to > store the data and went with List<string> and List<string[]> to read in this > much data. Any comments on how I might improve this performance would be > appreciated. To provide any comments we'd need to know more about what you're trying to do. For example, reading a 300MB XML file using XmlDocument will require *lots* of RAM, as in addition to the UTF-16 string issue, each element, attribute, etc. will be represented as separate objects, with varying amounts of memory required. DOM would be something to avoid here, while XmlReader would be much better. The easiest question, though, is this: do you really need to keep the entire file contents in memory all at once? Or can you instead process each line independently (or while caching minimal data from one line to the next, so that the contents of previous lines don't need to be maintained). This would allow you to remove your List<string>, and save a ton of memory. - Jon _______________________________________________ Mono-list maillist - [email protected] http://lists.ximian.com/mailman/listinfo/mono-list
