cccs-eric commented on issue #5719:
URL: https://github.com/apache/iceberg/issues/5719#issuecomment-1248037880
@nastra I have been troubleshooting this issue and I now know more about
what is going on. Here are the latest facts:
1. `{catalog}.system.expire_snapshots` is the problematic call, not
`{catalog}.system.rewrite_data_files`. I can reproduce the problem by only
calling `expire_snapshots` in a loop. So the take away here is that the
manifest files are not changing between runs. My test calls `expire_snapshots`
in a loop and out of let's say 10 runs, I'll get 1 or more failures, which
means that the manifest files are valid and the problem occurs "randomly".
2. My Iceberg tables are stored in Azure datalake (gen2, DFS) and Spark is
using the hadoop-azure package to read/write from the datalake. I had a good
look at the hadoop-azure code and they do pre-fetching (or lookaheads) for
files using multi-threading. To me, this looks suspicious for my issue and
what appears to be a random issue could very well be related to a concurrency
problem.
3. I was able to prove that moving to Spark 3.3.0 (from 3.2.1) and Iceberg
0.14 caused the problem since I can run my test just fine (85 consecutive
successful runs) using the previous Spark build (3.2.1-0.13.1), against the
same table/same files. As soon as I switch to the new build, the error occurs
within a few runs.
I think the problem is related to hadoop-azure and NOT iceberg, but I have
yet to prove it. The Spark upgrade to 3.3.0 includes an upgrade of the
hadoop-azure library (from 3.3.1 to 3.3.2) and some modifications to the
prefetch code was done during the 3.3.2 release. I have yet to go through all
those changes, I was hoping I could isolate and reproduce the problem without
using Spark, by only using the pure Java API). Which brings me to the
following question:
- When calling
[Table.expireSnapshots()](https://github.com/apache/iceberg/blob/0f0b1af2ebbc5693ff6dc8049a1e3490540311f8/core/src/main/java/org/apache/iceberg/BaseTable.java#L213),
I have realized that not a single Avro file is being read from the datalake.
But when the same [Spark
action](https://github.com/apache/iceberg/blob/0f0b1af2ebbc5693ff6dc8049a1e3490540311f8/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/ExpireSnapshotsSparkAction.java#L49-L63)
is being called, all the Avro manifest files are being read and this is where
the problem happens. Don't get me wrong, reading the manifests should not
trigger a problem, but I'd like to understand why the Spark implementation is
different from the pure Java one?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]