RazoEtsy commented on issue #15119:
URL: https://github.com/apache/iceberg/issues/15119#issuecomment-3801941196

   hey :wave-ralph: , similar issue here.
   
   Im trying to merge 3 datasets and it is failing with a similar error
   
   `java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of 
partitions: List(30000, 48882)`
   
   All 3 datasets are bucketed by the same key and have the same number of 
files.
   
   I noticed that the `SparkPartitioningAwareScan` was reporting that one of 
the partitions was greater than the other two.
   
   `SparkPartitioningAwareScan: Reporting KeyGroupedPartitioning by 
[identity(foo), identity(bar)] with 30000 partition(s) for table  baz
   `
   `SparkPartitioningAwareScan: Reporting KeyGroupedPartitioning by 
[identity(foo), identity(bar)] with 30000 partition(s) for table  quux`
   `SparkPartitioningAwareScan: Reporting KeyGroupedPartitioning by 
[identity(foo), identity(bar)] with 48882 partition(s) for table  garply`
   
   On disk, all three partitions have the same number of files, but one 
contains significantly more data. My hunch is that the logical partition is the 
source of the problems but i dont know how to prove it :melting_face:
   
   In case it helps for debugging i ran successfully a 3 dataset SPJ join with 
similar data size.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to