On Aug 8, 2011 12:11 AM, "Ramprasad Prasad" <ramprasad...@gmail.com> wrote: > > Using the system linux sort ... Does not help. > On my dual quad core machine , (8 gb ram) sort -n file takes 10 > minutes and in the end produces no output. >
I had a smaller file and 32g to play with on a dual quad core (dl320). Sort just can't handle more than 2~4 gigs. > when I put this data in mysql , there is an index on the order by > field ... But I guess keys don't help when you are selecting the > entire table. > > I guess there is a serious need for re-architecting , rather than > create such monstrous files, but when people work with legacy systems > which worked fine when there was lower usage and now you tell then you > need a overhaul because the current system doesn't scale ... That > takes a lot of convincing > You're dealing with a similar issue that I had in this respect too. The only difference is that I created my own issue out of ignorance (having never dealt with as much data and having set my dl320 to splice, sort, and merge I got through that). Well, with this data I just threw 30+ fields of a hundred thousand lines (yes, you've still got more data to deal with) into one table. This worked ok until my queries got a bit more complex at which point, it took me 8+ hours to generate a report. I rethink the tables (or more like read a bit and think about what the hell I'm doing) and create a half dozen relationships and I get the report down to little under 2 hours. My advise is to think about rethinking your db. This is probably going to mean rethinking software too (or, at least the queries it makes). You might want to check out the #mysql freenode irc channel - most of them are pompous but you'll get your answers. I think perl is less related to your issue but the people in the #dbi and dbic perl irc channels are much more easy going with their business. > On 8/8/11, Uri Guttman <u...@stemsystems.com> wrote: > >>>>>> "RP" == Rajeev Prasad <rp.ne...@yahoo.com> writes: > > > > RP> hi, you can try this: first get only that field (sed/awk/perl) > > RP> whihc you want to sort on in a file. sort that file which i assume > > RP> would be lot less in size then your current file/table. then run a > > RP> loop on the main file using sorted file as variable. > > > > RP> > > RP> here is the logic in shell: > > RP> > > RP> awk '{print $<filed-to-be-sorted-on>}' <large-file> > tmp-file > > RP> > > RP> sort <tmp-file> > > RP> > > > > RP> for id in `cat <sorted-temp-file>`;do grep $id <large-file> >> > > sorted-large-file;done > > > > have you thought about the time this will take? you are doing an O( N**2 > > ) grep there. you are looping over all N keys and then scanning the file > > N lines for each key. that will take a very long time for such a large > > file. as others have said, either use the sort utility or do a > > merge/sort on the records. your way is effectively a slow bubble sort! > > > > uri > > > > -- > > Uri Guttman -- uri AT perlhunter DOT com --- http://www.perlhunter.com > > -- > > ------------ Perl Developer Recruiting and Placement Services > > ------------- > > ----- Perl Code Review, Architecture, Development, Training, Support > > ------- > > > > -- > Sent from my mobile device > > Thanks > Ram > <http://www.netcore.co.in/> > > > > > n <http://pragatee.com> > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > >