From: Chas Owens <[EMAIL PROTECTED]> > On Mon, 2002-03-04 at 10:32, Brett W. McCoy wrote: > > On Mon, 4 Mar 2002, Dave Adams wrote: > > > > > I have a problem which I would like to use perl to resolve, > > > however I'm not sure if it is possible to do. > > > > > > I need to scan a file and check some conditions, first if field 9 > > > is duplicated on 1 or more rows, then I need to check field 10 to > > > see which is the greater value and then only print the whole row > > > where field 10 is the greater, if field 9 is not a duplicate then > > > print the whole row. An example of the data is below. > > > > > > 28525|U2|4CY0|50|6775.15|2002-02-07|10461|321.43|102040724|1020407 > > > 73| > > > 28526|U2|4CY0|25|3571.78|2002-02-07|6107|167.74|102040774|10204079 > > > 8| > > > 28527|U2|4CY0|50|6930.3|2002-02-07|11376|324.12|102040774|10204082 > > > 3| > > > 28528|U2|4CY0|25|4640.28|2002-02-07|4800|217.43|102040824|10204084 > > > 8| > > > 28529|U2|4CY0|50|8432.05|2002-02-07|9023|392.03|102040824|10204087 > > > 3| > > > > Of course this can be done with Perl, although the algorithm will > > take some thinking through. You should be able to read these rows > > into an array (using split, et al). > > > > However, this will be much easier if you can get this into a real > > database system so you can use DBI. In fact, a text file like this > > can even be used with DBI -- take a look at DBD:CSV. It lets you > > create and manipulate tables via SQL, which would make your problem > > much simpler (although DBD::CSV may have problems enforcing unique > > constraints). Otherwise, if you have a good bit of data to deal > > with, take a look at a database system like PostgreSQL, which can > > easily handle the logic you are trying to implement. > > > > -- Brett > > As a DBA I disagree. Text processing is often faster and easier to > write.
Depends. If you can go through the data once and mungle it as you go it will definitely be quicker if you do it from within Perl instead of pushing it into database and selecting something. In this case though it's not that easy. The problem is that the original request was "if field 9 is duplicated on 1 or more rows, then I need to check field 10 to see which is the greater value and then only print the whole row where field 10 is the greater, if field 9 is not a duplicate then print the whole row." You could do that from within Perl, you just had to remember the rows you've read in a hash, check whether you've seen a row with the same Field_9, remember the one with greater Field_10 and go on. And you'd only started printing when done with reading (I'm assuming the data are NOT sorted on Field_9 !!!) Or you could sort the data on Field_9 (any direction you like) and Field_10 (DESCending preferably) and then only print the first row with the same Field_9. On the other hand if you had the data in the database you could write this: SELECT Field_1, Field_2, Field_3, Field_4, Field_5, Field_6, Field_7, Field_8, Field_9, max(Field_10), Field_11 FROM MyData GROUP BY Field_1, Field_2, Field_3, Field_4, Field_5, Field_6, Field_7, Field_8, Field_9, Field_11 or SELECT * FROM MyData, (SELECT Field_9, Max(Field_10) as Max_Field_10 FROM MyData Group BY Field_9 ) as Max_MyData WHERE MyData.Field_9 = Max_MyData.Field_9 AND MyData.Field_10 = Max_MyData.Max_Field_10 No need for cursors ... but if there is not an index on Field_9 it'll be terribly slow. But still if you only need to do this it's not worth it to push the data into a database. Jenda =========== [EMAIL PROTECTED] == http://Jenda.Krynicky.cz ========== There is a reason for living. There must be. I've seen it somewhere. It's just that in the mess on my table ... and in my brain. I can't find it. --- me -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]