Madhu Reddy wrote:
Hi,
I want find a duplicate records in a large file....
it contains around 22 millions records.....

basically following is my file structure....

C1 C2 C3 C4
------------------------
12345 efghij klmno pqrs
34567 abnerv oiuuy uyrv
.......
.......

...........
............
.............

it has 22 million records..and each record have 4 columns (C1,C2,C3 and C4)

C1 is primary key....

here i want to do some validation..
following is my validation...

1. Validate record length
2. Check if first column is NULL
3. Separate duplicate records....

How do i separate dulicate records on such a huge
file.....

duplicate means...only primary key (column)...
if column1 (C1) is duplicate, that means that row is
duplicate row and need to write into another file....

Does anybody have effeciant algorithm to find
duplicate records on a large file ....

duplicate means, not complete row duplicate..if
column1 is duplicate, that means that row is
duplicate....

I appreciate u r help
1 word. Oracle.

This is really why database software exists. I know this probably didn't help much, but if this is more than a one time occurrence a real database ought to be considered.

You might start by checking out any reference material on how large database systems handle this sort of thing, as that is exactly what you are trying to mimic.

http://danconia.org


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to