On Thursday, May 03, 2012 07:15:46 AM Ricardo Araoz wrote:
> El 02/05/12 11:37, John Fabiani escribió:
> > On Wednesday, May 02, 2012 09:13:35 AM Bill House wrote:
> >> Hello all,
> >> 
> >> This is a generic question, but I am trying to learn enough about Dabo
> >> to use it and I need to understand how things work before I can
> >> proceed.  I am using postgresql 9.1 and psycopg2 2.4.5. and python 2.7
> >> on OpenSUSE 64 bit.
> >> 
> >> The only language I consider myself proficient in is dbase (not even
> >> foxpro).
> >> 
> >> But python feels comfortable and there is a lot of 1:1 equivalency
> >> between dbase server commands and SQL commands and it looks like Ed
> >> has
> >> already gone down this road many years ago.
> >> 
> >> What I am unsure of is the actual technique to accomplish with
> >> SQL/client tools what is so simple in dbase.  For example:
> >> 
> >> Given a dbase table called
> >> 
> >>     files
> >> 
> >> with a structure of
> >> 
> >>     md5sum        char   30
> >>     path               char 450
> >>     file_name        char 220
> >> 
> >> and indexed (or sorted) in the order of
> >> 
> >>     md5sum + path + file_name
> >> 
> >> run the following script
> >> 
> >> <CODE>
> >> ********************** mark_del.prg **********************
> >> *mark all records with identical md5sums as deleted
> >> *except for last one
> >> 
> >> *initialize some variables
> >> store ""    to m_md5sum
> >> store 0    to m_hit_cntr
> >> store 0     to m_place_holder
> >> 
> >> *open the table for reading and writing
> >> use files
> >> 
> >> *begin to process the table
> >> *iterate until end of table
> >> do while .not. eof()
> >> 
> >>   *save the md5sum for comparison
> >>   store md5sum to m_md5sum
> >>   
> >>   *save position in table if needed to come back here
> >>   *recno() is probably like oid in postgresql (unless one has
> >>   defined
> >> 
> >> one's own sequence)
> >> 
> >>   store recno() to m_place_holder
> >>   
> >>   *initialize the counter for duplicate md5sums
> >>   store 0 to m_hit_cntr
> >>   
> >>   *look for instances of duplication by counting identical md5sums
> >>   *iterate until a change in md5sum
> >>   do while md5sum = m_md5sum .and. .not. eof()
> >>   
> >>       *increment the counter
> >>       store m_hit_cntr + 1 to m_hit_cntr
> >>       *scroll one record
> >>       skip
> >>    
> >>    enddo
> >>   
> >>   *Are there duplicates?
> >>   
> >>    if m_hit_cntr > 1
> >>    
> >>       *go back to starting point
> >>       goto m_place_holder
> >>       
> >>        *decrement the counter by the one record to leave
> >>        unmarked
> >>       
> >>       store m_hit_cntr -1 to m_hit_cntr
> >>       
> >>       *go down the list again and mark records as deleted
> >>       *iterate until counter = 0
> >>       do while m_hit_cntr > 0 .and. .not. eof()
> >>       
> >>           *mark record as deleted
> >>           delete
> >>           
> >>            *decrement the counter
> >>            store m_hit_cntr -1 to m_hit_cntr
> >>           
> >>           *scroll one record
> >>           skip
> >>        
> >>        enddo
> >>        
> >>        *scroll past the record to retain
> >>        skip
> >>     
> >>     endif
> >>     
> >>     *scroll to next record to begin the sequence again
> >>     skip
> >> 
> >> enddo
> >> 
> >> *close the table
> >> use
> >> 
> >> *all done now
> >> 
> >> </CODE>
> >> 
> >> The code above is untested, but those of you with dbase experience
> >> should get the idea.
> >> 
> >> This is a real task by the way (I'm cleaning up my hard disk).  There
> >> are 1,100,000 records in the table about 47% of them duplicates.
> >> 
> >> My SQL table is identical to the table above with the addition of two
> >> 
> >> more fields:
> >>     del        boolian
> >>     recno    long long int (sequence unique)
> >> 
> >> I can mimic the above operations in postgresql/python except for one
> >> thing.
> >> 
> >> I can connect to the database and table
> >> I can iterate through the table
> >> I can scroll up and down the table
> >> I can update the field del in the table
> >> 
> >> Because of the size of the table, I think a named cursor is
> >> appropriate.
> >> 
> >> The problem I am having is that each update requires a commit
> >> statement
> >> and a commit statement closes the named cursor. So one loses his
> >> place.
> >> 
> >> I understand that psycopg2 named cursors now have an attribute
> >> 
> >>     withhold
> >> 
> >> and if that is set to true, then a commit will not close the named
> >> cursor (unless a rollback is done), but I have not been able to make
> >> this work yet.
> >> 
> >> While studying this, I thought of an approach to do what I am trying
> >> to
> >> do.  This approach involves the use of two cursors that I call a
> >> "walking" cursor and a "working" cursor.
> >> 
> >> This seems so messy to me so I wanted to see how Dabo did it.
> >> 
> >> I have been reading the Dabo code and I notice the use of what is
> >> called an>> 
> >>     AuxCursor
> >> 
> >> Which is described at
> >> 
> >>     line 439 of dCursorMixin.py
> >> 
> >> Where it says:
> >>     ...because it won't harm the contents of the main cursor....
> >> 
> >> So (finally) my questions are:
> >> 
> >> Is the use of an AuxCursor the only way to do what I want to do
> >> (unless
> >> I can get the withhold attribute to behave)?
> >> 
> >> If there is another way, where is the best description of it so I can
> >> read about it?
> >> 
> >> Thanks in advance,
> >> 
> >> Bill House
> > 
> > My first bit of advise is
> > SQL <> dbaseIII
> > 
> > There are many ways to skin this cat but you first need to review how
> > SQL
> > works and how it is different from dbase.
> > 
> > But in general
> > you DO NOT commit after each update instead for your project:
> > pseudo code
> > 
> > begin
> > loop
> > 
> >     update a row (you are looping over all your rows)
> > 
> > commit
> > or
> > rollback
> > 
> > Note that you do everything you need to do and then you either commit or
> > rollback. These are called "Transactions"
> > 
> > http://www.postgresql.org/docs/8.3/static/tutorial-transactions.html
> > 
> > Once you understand how the SQL works then I believe you will better
> > understand what Dabo is doing.
> > 
> > pseudo Dabo code
> > 
> > for rec in DataSet:
> >    # set the value
> >    rec['fieldName'] = someVal
> > 
> > bizobj.save()
> > 
> > Johnf
> 
> Don't know if I understood what you want. I think you want to get rid of
> duplicate records.
> It it is so, then in SQL is really simple :
> 
> select distinct * from TheTable
> 
> T t t t that's all folks.


He wants all the dups removed but the last one.

Johnf
_______________________________________________
Post Messages to: [email protected]
Subscription Maintenance: http://leafe.com/mailman/listinfo/dabo-users
Searchable Archives: http://leafe.com/archives/search/dabo-users
This message: http://leafe.com/archives/byMID/2178129.uNiTdTEeE4@linux-12

Reply via email to