El 02/05/12 11:37, John Fabiani escribió: > On Wednesday, May 02, 2012 09:13:35 AM Bill House wrote: >> Hello all, >> >> This is a generic question, but I am trying to learn enough about Dabo >> to use it and I need to understand how things work before I can >> proceed. I am using postgresql 9.1 and psycopg2 2.4.5. and python 2.7 >> on OpenSUSE 64 bit. >> >> The only language I consider myself proficient in is dbase (not even >> foxpro). >> >> But python feels comfortable and there is a lot of 1:1 equivalency >> between dbase server commands and SQL commands and it looks like Ed has >> already gone down this road many years ago. >> >> What I am unsure of is the actual technique to accomplish with >> SQL/client tools what is so simple in dbase. For example: >> >> Given a dbase table called >> >> files >> >> with a structure of >> >> md5sum char 30 >> path char 450 >> file_name char 220 >> >> and indexed (or sorted) in the order of >> >> md5sum + path + file_name >> >> run the following script >> >> <CODE> >> ********************** mark_del.prg ********************** >> *mark all records with identical md5sums as deleted >> *except for last one >> >> *initialize some variables >> store "" to m_md5sum >> store 0 to m_hit_cntr >> store 0 to m_place_holder >> >> *open the table for reading and writing >> use files >> >> *begin to process the table >> *iterate until end of table >> do while .not. eof() >> *save the md5sum for comparison >> store md5sum to m_md5sum >> >> *save position in table if needed to come back here >> *recno() is probably like oid in postgresql (unless one has defined >> one's own sequence) >> store recno() to m_place_holder >> >> *initialize the counter for duplicate md5sums >> store 0 to m_hit_cntr >> >> *look for instances of duplication by counting identical md5sums >> *iterate until a change in md5sum >> do while md5sum = m_md5sum .and. .not. eof() >> *increment the counter >> store m_hit_cntr + 1 to m_hit_cntr >> *scroll one record >> skip >> enddo >> >> *Are there duplicates? >> if m_hit_cntr > 1 >> *go back to starting point >> goto m_place_holder >> >> *decrement the counter by the one record to leave unmarked >> store m_hit_cntr -1 to m_hit_cntr >> >> *go down the list again and mark records as deleted >> *iterate until counter = 0 >> do while m_hit_cntr > 0 .and. .not. eof() >> *mark record as deleted >> delete >> *decrement the counter >> store m_hit_cntr -1 to m_hit_cntr >> *scroll one record >> skip >> enddo >> >> *scroll past the record to retain >> skip >> endif >> >> *scroll to next record to begin the sequence again >> skip >> enddo >> >> *close the table >> use >> >> *all done now >> >> </CODE> >> >> The code above is untested, but those of you with dbase experience >> should get the idea. >> >> This is a real task by the way (I'm cleaning up my hard disk). There >> are 1,100,000 records in the table about 47% of them duplicates. >> >> My SQL table is identical to the table above with the addition of two >> more fields: >> >> del boolian >> recno long long int (sequence unique) >> >> I can mimic the above operations in postgresql/python except for one thing. >> >> I can connect to the database and table >> I can iterate through the table >> I can scroll up and down the table >> I can update the field del in the table >> >> Because of the size of the table, I think a named cursor is appropriate. >> >> The problem I am having is that each update requires a commit statement >> and a commit statement closes the named cursor. So one loses his place. >> >> I understand that psycopg2 named cursors now have an attribute >> >> withhold >> >> and if that is set to true, then a commit will not close the named >> cursor (unless a rollback is done), but I have not been able to make >> this work yet. >> >> While studying this, I thought of an approach to do what I am trying to >> do. This approach involves the use of two cursors that I call a >> "walking" cursor and a "working" cursor. >> >> This seems so messy to me so I wanted to see how Dabo did it. >> >> I have been reading the Dabo code and I notice the use of what is called an >> >> AuxCursor >> >> Which is described at >> >> line 439 of dCursorMixin.py >> >> Where it says: >> >> ...because it won't harm the contents of the main cursor.... >> >> So (finally) my questions are: >> >> Is the use of an AuxCursor the only way to do what I want to do (unless >> I can get the withhold attribute to behave)? >> >> If there is another way, where is the best description of it so I can >> read about it? >> >> Thanks in advance, >> >> Bill House > My first bit of advise is > SQL <> dbaseIII > > There are many ways to skin this cat but you first need to review how SQL > works and how it is different from dbase. > > But in general > you DO NOT commit after each update instead for your project: > pseudo code > > begin > loop > update a row (you are looping over all your rows) > commit > or > rollback > > Note that you do everything you need to do and then you either commit or > rollback. These are called "Transactions" > > http://www.postgresql.org/docs/8.3/static/tutorial-transactions.html > > Once you understand how the SQL works then I believe you will better > understand what Dabo is doing. > > pseudo Dabo code > > for rec in DataSet: > # set the value > rec['fieldName'] = someVal > bizobj.save() > > Johnf >
Don't know if I understood what you want. I think you want to get rid of duplicate records. It it is so, then in SQL is really simple : select distinct * from TheTable T t t t that's all folks. _______________________________________________ Post Messages to: [email protected] Subscription Maintenance: http://leafe.com/mailman/listinfo/dabo-users Searchable Archives: http://leafe.com/archives/search/dabo-users This message: http://leafe.com/archives/byMID/[email protected]
