From: Chas Owens <[EMAIL PROTECTED]>
> On Mon, 2002-03-04 at 10:32, Brett W. McCoy wrote:
> > On Mon, 4 Mar 2002, Dave Adams wrote:
> > 
> > > I have a problem which I would like to use perl to resolve,
> > > however I'm not sure if it is possible to do.
> > >
> > > I need to scan a file and check some conditions, first if field 9
> > > is duplicated on 1 or more rows, then I need to check field 10 to
> > > see which is the greater value and then only print the whole row
> > > where field 10 is the greater, if field 9 is not a duplicate then
> > > print the whole row. An example of the data is below.
> > >
> > > 28525|U2|4CY0|50|6775.15|2002-02-07|10461|321.43|102040724|1020407
> > > 73|
> > > 28526|U2|4CY0|25|3571.78|2002-02-07|6107|167.74|102040774|10204079
> > > 8|
> > > 28527|U2|4CY0|50|6930.3|2002-02-07|11376|324.12|102040774|10204082
> > > 3|
> > > 28528|U2|4CY0|25|4640.28|2002-02-07|4800|217.43|102040824|10204084
> > > 8|
> > > 28529|U2|4CY0|50|8432.05|2002-02-07|9023|392.03|102040824|10204087
> > > 3|
> > 
> > Of course this can be done with Perl, although the algorithm will
> > take some thinking through.  You should be able to read these rows
> > into an array (using split, et al).
> > 
> > However, this will be much easier if you can get this into a real
> > database system so you can use DBI.  In fact, a text file like this
> > can even be used with DBI -- take a look at DBD:CSV.  It lets you
> > create and manipulate tables via SQL, which would make your problem
> > much simpler (although DBD::CSV may have problems enforcing unique
> > constraints). Otherwise, if you have a good bit of data to deal
> > with, take a look at a database system like PostgreSQL, which can
> > easily handle the logic you are trying to implement.
> > 
> > -- Brett
> 
> As a DBA I disagree.  Text processing is often faster and easier to
> write. 

Depends. If you can go through the data once and mungle it as you 
go it will definitely be quicker if you do it from within Perl instead of 
pushing it into database and selecting something.

In this case though it's not that easy. The problem is that the 
original request was "if field 9 is duplicated on 1 or more rows, then 
I need to check field 10 to see which is the greater value and then 
only print the whole row where field 10 is the greater, if field 9 is not 
a duplicate then print the whole row."

You could do that from within Perl, you just had to remember the 
rows you've read in a hash, check whether you've seen a row with 
the same Field_9, remember the one with greater Field_10 and go 
on. And you'd only started printing when done with reading (I'm 
assuming the data are NOT sorted on Field_9 !!!)

Or you could sort the data on Field_9 (any direction you like) and 
Field_10 (DESCending preferably) and then only print the first row 
with the same Field_9.

On the other hand if you had the data in the database you could 
write this:

SELECT Field_1, Field_2, Field_3, Field_4, Field_5, Field_6, 
Field_7, Field_8, Field_9, max(Field_10), Field_11
FROM MyData 
GROUP BY Field_1, Field_2, Field_3, Field_4, Field_5, Field_6, 
Field_7, Field_8, Field_9, Field_11

or

SELECT * 
FROM MyData, (SELECT Field_9, Max(Field_10) as Max_Field_10
        FROM MyData
        Group BY Field_9
) as Max_MyData
WHERE MyData.Field_9 = Max_MyData.Field_9
   AND MyData.Field_10 = Max_MyData.Max_Field_10

No need for cursors ... but if there is not an index on Field_9 it'll be 
terribly slow.

But still if you only need to do this it's not worth it to push the data 
into a database.

Jenda

=========== [EMAIL PROTECTED] == http://Jenda.Krynicky.cz ==========
There is a reason for living. There must be. I've seen it somewhere.
It's just that in the mess on my table ... and in my brain.
I can't find it.
                                        --- me

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to