`Hey Steve, Thanks for the clear reproduction test case, I think that's very helpful. I did some debugging locally, and my suspicion is that it's incorrect/unexpected that NullVectorReader being used for reading the new optional column. I could be wrong but it seems like we should be allocating a specific typed reader (so for the example in the test case an IntVectorReader) . I'll try and look into this further sometime this week but at least from my understanding, I'd debug how we're getting to a state where the reader for the new column is a NullVectorReader and confirm if that's expected or not.
Thanks, Amogh Jahagirdar On Wed, Jun 26, 2024 at 6:05 PM Lessard, Steve <steve.less...@teradata.com.invalid> wrote: > I have found unexpected behavior in iceberg-arrow’s vectorized read > support. After quite a bit of digging and collaboration with Eduard > Tudenhoefner we have determined that there is a bug in iceberg-arrow, but > we have not been able to determine exactly what the bug is. Can you please > help identify the root cause of the issue I originally reported as issue > 10275 <https://github.com/apache/iceberg/issues/10275>? > > > > Since I opened that issue I’ve learned a bit more about the issue and now > have a clear reproduction case. The steps to reproduce the bug are: > > 1. Create a table > 2. Add one row to the table > 3. Alter the table’s schema by adding a new, optional column with no > default value > 4. Read all rows, all columns from the table > 5. Blamo! The code currently in apache/iceberg will throw a > NullPointerException > > > > I have written a unit test that reproduces this bug. You can view the test > at > https://github.com/apache/iceberg/pull/10284/files#diff-c3da34dcdb02c2db690c86a2b8356a405c899dec410bdb0b9bcee79fd8c63dc7 > > > > Initially I tried to fix the bug by preventing the NullPointerException, > but all the while I suspected that the NPE is just a symptom of a larger > bug. When I submitted a pull request containing my fix for the NPE Eduard > Tudenhoefner reviewed the PR and came to the same conclusion, the NPE is a > symptom of a larger bug within iceberg-arrow. The problem is neither of us > can identify the actual bug. > > > > Again, I ask, can you please help identify the root cause of the issue I > originally reported as issue 10275 > <https://github.com/apache/iceberg/issues/10275>? > > > > -Steve Lessard, Teradata > > > > >