Hello,
We are using Apache Iceberg with AWS Glue. We are seeing an issue where 
duplicates are getting inserted into the table, even after making sure there 
are no duplicates in the data being upserted into the table. We use MERGE sql 
to upsert data into the table.

We also see an issue where duplicates appear in the SELECT sql query, when 
queried using spark SQL. But when we query the same table using Athena, we 
don’t see any duplicates in the table.

We see this issue only with a few tables in our database and not all of them.

We followed the directions mentioned in this blog post to do our setup - 
https://aws.amazon.com/blogs/big-data/implement-a-cdc-based-upsert-in-a-data-lake-using-apache-iceberg-and-aws-glue/.

We are currently using Spark version 3.3, Scala – 2.12, Glue – 4.0 and 
Iceberg.- 1.0.0.

Any inputs are appreciated.

Thanks!

Shwetha Dharmarajan
Senior Staff Software Engineer

Edelman Financial Engine
Ranked #1 independent advisory firm by Barron’s1

Visit. 3315 Scott Blvd, 4th Floor, Santa Clara, CA 95054
Click. EdelmanFinancialEngines.com<https://www.edelmanfinancialengines.com/>
Connect. Newsletter<https://www.edelmanfinancialengines.com/newsletter> | 
Podcast<https://www.edelmanfinancial.com/radio> | 
Radio<https://www.edelmanfinancial.com/radio> | 
TV<https://www.edelmanfinancial.com/tv> | 
Books<https://www.edelmanfinancial.com/books>

Shwetha Dharmarajan
Senior Staff Software Engineer

Edelman Financial Engines
Ranked #1 independent financial advisory firm in the nation by Barron’s*.
Awarded September 2023 based on data within a 12-month period.

Call. 408.498.6880 (direct)
Visit. 3315 Scott Blvd, 4th Floor, Santa Clara, CA 95054
Click. EdelmanFinancialEngines.com<https://www.edelmanfinancialengines.com/>
Connect.  Radio & 
Podcast<https://www.edelmanfinancialengines.com/everyday-wealth/> | 
LinkedIn<https://www.linkedin.com/company/edelman-financial-engines>

*The Barron’s 2023 Top 100 RIA Firms list, a ranking of independent advisory 
firms, is qualitative and quantitative, and considers assets managed by the 
firms, technology spending, staff diversity, succession planning and other 
metrics. Firms elect to participate but do not pay to be included in the 
ranking. Investor experience and returns are not considered.

NOTICE: This e-mail and any attachments to it may be privileged, confidential 
or contain trade secret information and is intended only for the use of the 
individual or entity to which it is addressed. If this e-mail was sent to you 
in error, please notify us immediately by either reply e-mail or by phone at 
833-PLAN-EFE, and do not use, disseminate, retain, print or copy the e-mail or 
any attachment. All messages sent to and from this e-mail address may be 
monitored as permitted by or necessary under applicable law and regulations.

We cannot accept orders for transactions or other similar instructions through 
e-mail. We cannot ensure the security of information e-mailed over the 
Internet; please exercise caution when transmitting confidential information 
such as account numbers and security holdings.

Reply via email to