nikfio commented on issue #3102:
URL: https://github.com/apache/arrow-adbc/issues/3102#issuecomment-3288877196

   Hi all,
   
   sorry for my late reply.
   I managed to achieve the result by explicitly executing a clean of all rows 
with the exact same timestamp and keeping valid the first one encountered.
   
   Here is the code snippet:
   
   `
    conn = self.connect()
            
               # delete duplicates 
               query_clean = f'''DELETE FROM {target_table}
                               WHERE ROWID NOT IN (
                               SELECT MIN(ROWID) 
                               FROM {target_table} 
                               GROUP BY {BASE_DATA_COLUMN_NAME.TIMESTAMP}
                               );'''
               
               cur = conn.cursor()
               res = cur.execute(query_clean)
               
               # Close
               cur.close()
               conn.commit()
               conn.close()
   `
   
   Let me know what you guys think of the solution.
   Of course, it assumes that there is already a timestamp column in place, 
which may be a hard constraint.
   Timestamp column may be any column having a unique value for rows having 
same values on all columns.
   
   Thanks,
   Nick


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to