`Hey Steve,

Thanks for the clear reproduction test case, I think that's very helpful. I
did some debugging locally, and my suspicion is that it's
incorrect/unexpected that NullVectorReader being used for reading the new
optional column. I could be wrong but it seems like we should be allocating
a specific typed reader (so for the example in the test case an
IntVectorReader) . I'll try and look into this further sometime this week
but at least from my understanding, I'd debug how we're getting to a state
where the reader for the new column is a NullVectorReader and confirm if
that's expected or not.

Thanks,

Amogh Jahagirdar

On Wed, Jun 26, 2024 at 6:05 PM Lessard, Steve
<steve.less...@teradata.com.invalid> wrote:

> I have found unexpected behavior in iceberg-arrow’s vectorized read
> support. After quite a bit of digging and collaboration with Eduard
> Tudenhoefner we have determined that there is a bug in iceberg-arrow, but
> we have not been able to determine exactly what the bug is. Can you please
> help identify the root cause of the issue I originally reported as issue
> 10275 <https://github.com/apache/iceberg/issues/10275>?
>
>
>
> Since I opened that issue I’ve learned a bit more about the issue and now
> have a clear reproduction case. The steps to reproduce the bug are:
>
>    1. Create a table
>    2. Add one row to the table
>    3. Alter the table’s schema by adding a new, optional column with no
>    default value
>    4. Read all rows, all columns from the table
>    5. Blamo! The code currently in apache/iceberg will throw a
>    NullPointerException
>
>
>
> I have written a unit test that reproduces this bug. You can view the test
> at
> https://github.com/apache/iceberg/pull/10284/files#diff-c3da34dcdb02c2db690c86a2b8356a405c899dec410bdb0b9bcee79fd8c63dc7
>
>
>
> Initially I tried to fix the bug by preventing the NullPointerException,
> but all the while I suspected that the NPE is just a symptom of a larger
> bug. When I submitted a pull request containing my fix for the NPE Eduard
> Tudenhoefner reviewed the PR and came to the same conclusion, the NPE is a
> symptom of a larger bug within iceberg-arrow. The problem is neither of us
> can identify the actual bug.
>
>
>
> Again, I ask, can you please help identify the root cause of the issue I
> originally reported as issue 10275
> <https://github.com/apache/iceberg/issues/10275>?
>
>
>
> -Steve Lessard, Teradata
>
>
>
>
>

Reply via email to