On Thursday, May 03, 2012 07:15:46 AM Ricardo Araoz wrote: > El 02/05/12 11:37, John Fabiani escribió: > > On Wednesday, May 02, 2012 09:13:35 AM Bill House wrote: > >> Hello all, > >> > >> This is a generic question, but I am trying to learn enough about Dabo > >> to use it and I need to understand how things work before I can > >> proceed. I am using postgresql 9.1 and psycopg2 2.4.5. and python 2.7 > >> on OpenSUSE 64 bit. > >> > >> The only language I consider myself proficient in is dbase (not even > >> foxpro). > >> > >> But python feels comfortable and there is a lot of 1:1 equivalency > >> between dbase server commands and SQL commands and it looks like Ed > >> has > >> already gone down this road many years ago. > >> > >> What I am unsure of is the actual technique to accomplish with > >> SQL/client tools what is so simple in dbase. For example: > >> > >> Given a dbase table called > >> > >> files > >> > >> with a structure of > >> > >> md5sum char 30 > >> path char 450 > >> file_name char 220 > >> > >> and indexed (or sorted) in the order of > >> > >> md5sum + path + file_name > >> > >> run the following script > >> > >> <CODE> > >> ********************** mark_del.prg ********************** > >> *mark all records with identical md5sums as deleted > >> *except for last one > >> > >> *initialize some variables > >> store "" to m_md5sum > >> store 0 to m_hit_cntr > >> store 0 to m_place_holder > >> > >> *open the table for reading and writing > >> use files > >> > >> *begin to process the table > >> *iterate until end of table > >> do while .not. eof() > >> > >> *save the md5sum for comparison > >> store md5sum to m_md5sum > >> > >> *save position in table if needed to come back here > >> *recno() is probably like oid in postgresql (unless one has > >> defined > >> > >> one's own sequence) > >> > >> store recno() to m_place_holder > >> > >> *initialize the counter for duplicate md5sums > >> store 0 to m_hit_cntr > >> > >> *look for instances of duplication by counting identical md5sums > >> *iterate until a change in md5sum > >> do while md5sum = m_md5sum .and. .not. eof() > >> > >> *increment the counter > >> store m_hit_cntr + 1 to m_hit_cntr > >> *scroll one record > >> skip > >> > >> enddo > >> > >> *Are there duplicates? > >> > >> if m_hit_cntr > 1 > >> > >> *go back to starting point > >> goto m_place_holder > >> > >> *decrement the counter by the one record to leave > >> unmarked > >> > >> store m_hit_cntr -1 to m_hit_cntr > >> > >> *go down the list again and mark records as deleted > >> *iterate until counter = 0 > >> do while m_hit_cntr > 0 .and. .not. eof() > >> > >> *mark record as deleted > >> delete > >> > >> *decrement the counter > >> store m_hit_cntr -1 to m_hit_cntr > >> > >> *scroll one record > >> skip > >> > >> enddo > >> > >> *scroll past the record to retain > >> skip > >> > >> endif > >> > >> *scroll to next record to begin the sequence again > >> skip > >> > >> enddo > >> > >> *close the table > >> use > >> > >> *all done now > >> > >> </CODE> > >> > >> The code above is untested, but those of you with dbase experience > >> should get the idea. > >> > >> This is a real task by the way (I'm cleaning up my hard disk). There > >> are 1,100,000 records in the table about 47% of them duplicates. > >> > >> My SQL table is identical to the table above with the addition of two > >> > >> more fields: > >> del boolian > >> recno long long int (sequence unique) > >> > >> I can mimic the above operations in postgresql/python except for one > >> thing. > >> > >> I can connect to the database and table > >> I can iterate through the table > >> I can scroll up and down the table > >> I can update the field del in the table > >> > >> Because of the size of the table, I think a named cursor is > >> appropriate. > >> > >> The problem I am having is that each update requires a commit > >> statement > >> and a commit statement closes the named cursor. So one loses his > >> place. > >> > >> I understand that psycopg2 named cursors now have an attribute > >> > >> withhold > >> > >> and if that is set to true, then a commit will not close the named > >> cursor (unless a rollback is done), but I have not been able to make > >> this work yet. > >> > >> While studying this, I thought of an approach to do what I am trying > >> to > >> do. This approach involves the use of two cursors that I call a > >> "walking" cursor and a "working" cursor. > >> > >> This seems so messy to me so I wanted to see how Dabo did it. > >> > >> I have been reading the Dabo code and I notice the use of what is > >> called an>> > >> AuxCursor > >> > >> Which is described at > >> > >> line 439 of dCursorMixin.py > >> > >> Where it says: > >> ...because it won't harm the contents of the main cursor.... > >> > >> So (finally) my questions are: > >> > >> Is the use of an AuxCursor the only way to do what I want to do > >> (unless > >> I can get the withhold attribute to behave)? > >> > >> If there is another way, where is the best description of it so I can > >> read about it? > >> > >> Thanks in advance, > >> > >> Bill House > > > > My first bit of advise is > > SQL <> dbaseIII > > > > There are many ways to skin this cat but you first need to review how > > SQL > > works and how it is different from dbase. > > > > But in general > > you DO NOT commit after each update instead for your project: > > pseudo code > > > > begin > > loop > > > > update a row (you are looping over all your rows) > > > > commit > > or > > rollback > > > > Note that you do everything you need to do and then you either commit or > > rollback. These are called "Transactions" > > > > http://www.postgresql.org/docs/8.3/static/tutorial-transactions.html > > > > Once you understand how the SQL works then I believe you will better > > understand what Dabo is doing. > > > > pseudo Dabo code > > > > for rec in DataSet: > > # set the value > > rec['fieldName'] = someVal > > > > bizobj.save() > > > > Johnf > > Don't know if I understood what you want. I think you want to get rid of > duplicate records. > It it is so, then in SQL is really simple : > > select distinct * from TheTable > > T t t t that's all folks.
He wants all the dups removed but the last one. Johnf _______________________________________________ Post Messages to: [email protected] Subscription Maintenance: http://leafe.com/mailman/listinfo/dabo-users Searchable Archives: http://leafe.com/archives/search/dabo-users This message: http://leafe.com/archives/byMID/2178129.uNiTdTEeE4@linux-12
