El 02/05/12 11:37, John Fabiani escribió:
> On Wednesday, May 02, 2012 09:13:35 AM Bill House wrote:
>> Hello all,
>>
>> This is a generic question, but I am trying to learn enough about Dabo
>> to use it and I need to understand how things work before I can
>> proceed.  I am using postgresql 9.1 and psycopg2 2.4.5. and python 2.7
>> on OpenSUSE 64 bit.
>>
>> The only language I consider myself proficient in is dbase (not even
>> foxpro).
>>
>> But python feels comfortable and there is a lot of 1:1 equivalency
>> between dbase server commands and SQL commands and it looks like Ed has
>> already gone down this road many years ago.
>>
>> What I am unsure of is the actual technique to accomplish with
>> SQL/client tools what is so simple in dbase.  For example:
>>
>> Given a dbase table called
>>
>>     files
>>
>> with a structure of
>>
>>     md5sum        char   30
>>     path               char 450
>>     file_name        char 220
>>
>> and indexed (or sorted) in the order of
>>
>>     md5sum + path + file_name
>>
>> run the following script
>>
>> <CODE>
>> ********************** mark_del.prg **********************
>> *mark all records with identical md5sums as deleted
>> *except for last one
>>
>> *initialize some variables
>> store ""    to m_md5sum
>> store 0    to m_hit_cntr
>> store 0     to m_place_holder
>>
>> *open the table for reading and writing
>> use files
>>
>> *begin to process the table
>> *iterate until end of table
>> do while .not. eof()
>>   *save the md5sum for comparison
>>   store md5sum to m_md5sum
>>
>>   *save position in table if needed to come back here
>>   *recno() is probably like oid in postgresql (unless one has defined
>> one's own sequence)
>>   store recno() to m_place_holder
>>
>>   *initialize the counter for duplicate md5sums
>>   store 0 to m_hit_cntr
>>
>>   *look for instances of duplication by counting identical md5sums
>>   *iterate until a change in md5sum
>>   do while md5sum = m_md5sum .and. .not. eof()
>>       *increment the counter
>>       store m_hit_cntr + 1 to m_hit_cntr
>>       *scroll one record
>>       skip
>>    enddo
>>
>>   *Are there duplicates?
>>    if m_hit_cntr > 1
>>       *go back to starting point
>>       goto m_place_holder
>>
>>        *decrement the counter by the one record to leave unmarked
>>       store m_hit_cntr -1 to m_hit_cntr
>>
>>       *go down the list again and mark records as deleted
>>       *iterate until counter = 0
>>       do while m_hit_cntr > 0 .and. .not. eof()
>>           *mark record as deleted
>>           delete
>>            *decrement the counter
>>            store m_hit_cntr -1 to m_hit_cntr
>>           *scroll one record
>>           skip
>>        enddo
>>
>>        *scroll past the record to retain
>>        skip
>>     endif
>>
>>     *scroll to next record to begin the sequence again
>>     skip
>> enddo
>>
>> *close the table
>> use
>>
>> *all done now
>>
>> </CODE>
>>
>> The code above is untested, but those of you with dbase experience
>> should get the idea.
>>
>> This is a real task by the way (I'm cleaning up my hard disk).  There
>> are 1,100,000 records in the table about 47% of them duplicates.
>>
>> My SQL table is identical to the table above with the addition of two
>> more fields:
>>
>>     del        boolian
>>     recno    long long int (sequence unique)
>>
>> I can mimic the above operations in postgresql/python except for one thing.
>>
>> I can connect to the database and table
>> I can iterate through the table
>> I can scroll up and down the table
>> I can update the field del in the table
>>
>> Because of the size of the table, I think a named cursor is appropriate.
>>
>> The problem I am having is that each update requires a commit statement
>> and a commit statement closes the named cursor. So one loses his place.
>>
>> I understand that psycopg2 named cursors now have an attribute
>>
>>     withhold
>>
>> and if that is set to true, then a commit will not close the named
>> cursor (unless a rollback is done), but I have not been able to make
>> this work yet.
>>
>> While studying this, I thought of an approach to do what I am trying to
>> do.  This approach involves the use of two cursors that I call a
>> "walking" cursor and a "working" cursor.
>>
>> This seems so messy to me so I wanted to see how Dabo did it.
>>
>> I have been reading the Dabo code and I notice the use of what is called an
>>
>>     AuxCursor
>>
>> Which is described at
>>
>>     line 439 of dCursorMixin.py
>>
>> Where it says:
>>
>>     ...because it won't harm the contents of the main cursor....
>>
>> So (finally) my questions are:
>>
>> Is the use of an AuxCursor the only way to do what I want to do (unless
>> I can get the withhold attribute to behave)?
>>
>> If there is another way, where is the best description of it so I can
>> read about it?
>>
>> Thanks in advance,
>>
>> Bill House
> My first bit of advise is
> SQL <> dbaseIII
>
> There are many ways to skin this cat but you first need to review how SQL 
> works and how it is different from dbase.
>
> But in general 
> you DO NOT commit after each update instead for your project:
> pseudo code
>
> begin
> loop
>     update a row (you are looping over all your rows)
> commit
> or
> rollback
>
> Note that you do everything you need to do and then you either commit or 
> rollback. These are called "Transactions"
>
> http://www.postgresql.org/docs/8.3/static/tutorial-transactions.html
>
> Once you understand how the SQL works then I believe you will better 
> understand what Dabo is doing.
>
> pseudo Dabo code
>
> for rec in DataSet:
>    # set the value
>    rec['fieldName'] = someVal
> bizobj.save()
>
> Johnf
>

Don't know if I understood what you want. I think you want to get rid of
duplicate records.
It it is so, then in SQL is really simple :

select distinct * from TheTable

T t t t that's all folks.



_______________________________________________
Post Messages to: [email protected]
Subscription Maintenance: http://leafe.com/mailman/listinfo/dabo-users
Searchable Archives: http://leafe.com/archives/search/dabo-users
This message: http://leafe.com/archives/byMID/[email protected]

Reply via email to