cccs-eric commented on issue #5719:
URL: https://github.com/apache/iceberg/issues/5719#issuecomment-1248037880

   @nastra I have been troubleshooting this issue and I now know more about 
what is going on.  Here are the latest facts:
   
   1. `{catalog}.system.expire_snapshots` is the problematic call, not 
`{catalog}.system.rewrite_data_files`.  I can reproduce the problem by only 
calling `expire_snapshots` in a loop.  So the take away here is that the 
manifest files are not changing between runs.  My test calls `expire_snapshots` 
in a loop and out of let's say 10 runs, I'll get 1 or more failures, which 
means that the manifest files are valid and the problem occurs "randomly".
   2. My Iceberg tables are stored in Azure datalake (gen2, DFS) and Spark is 
using the hadoop-azure package to read/write from the datalake.  I had a good 
look at the hadoop-azure code and they do pre-fetching (or lookaheads) for 
files using multi-threading.  To me, this looks suspicious for my issue and 
what appears to be a random issue could very well be related to a concurrency 
problem.
   3. I was able to prove that moving to Spark 3.3.0 (from 3.2.1) and Iceberg 
0.14 caused the problem since I can run my test just fine (85 consecutive 
successful runs) using the previous Spark build (3.2.1-0.13.1), against the 
same table/same files.  As soon as I switch to the new build, the error occurs 
within a few runs.
   
   I think the problem is related to hadoop-azure and NOT iceberg, but I have 
yet to prove it.  The Spark upgrade to 3.3.0 includes an upgrade of the 
hadoop-azure library (from 3.3.1 to 3.3.2) and some modifications to the 
prefetch code was done during the 3.3.2 release.  I have yet to go through all 
those changes, I was hoping I could isolate and reproduce the problem without 
using Spark, by only using the pure Java API).  Which brings me to the 
following question:
   
   - When calling 
[Table.expireSnapshots()](https://github.com/apache/iceberg/blob/0f0b1af2ebbc5693ff6dc8049a1e3490540311f8/core/src/main/java/org/apache/iceberg/BaseTable.java#L213),
 I have realized that not a single Avro file is being read from the datalake.  
But when the same [Spark 
action](https://github.com/apache/iceberg/blob/0f0b1af2ebbc5693ff6dc8049a1e3490540311f8/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/ExpireSnapshotsSparkAction.java#L49-L63)
 is being called, all the Avro manifest files are being read and this is where 
the problem happens.  Don't get me wrong, reading the manifests should not 
trigger a problem, but I'd like to understand why the Spark implementation is 
different from the pure Java one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to