[
https://issues.apache.org/jira/browse/ARROW-17005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Swenson updated ARROW-17005:
-------------------------------------
Summary: [Java] Incorrect results from JDBC Adapter from Postgres of
non-nullable column through left join (was: [Java] Provide way to override
nullability of JDBC Column in JDBC to Arrow Adapter)
> [Java] Incorrect results from JDBC Adapter from Postgres of non-nullable
> column through left join
> -------------------------------------------------------------------------------------------------
>
> Key: ARROW-17005
> URL: https://issues.apache.org/jira/browse/ARROW-17005
> Project: Apache Arrow
> Issue Type: Wish
> Components: Java
> Reporter: Jonathan Swenson
> Priority: Major
>
> Unsure to consider this a bug or wish, but the JDBC to Arrow Adapter produces
> incorrect results when wrapping the postgres driver in certain cases.
> If you left join a non-nullable column, the column becomes nullable (if the
> join condition does not match any columns). However the postgres
> ResultSetMetaData lies to you and still indicates that the column is still
> non-nullable.
> When iterating through the data, results come back as null (isNull will
> return true).
> However, because of the way that the JDBCConsumer is created, it creates a
> non-nullable consumer and will not check the nullability of these results.
> Unfortunately, this results in incorrect data or errors depending on the data
> types returned.
> The postgres JDBC team has closed a ticket about this indicating that it
> would be impossible for them to return the correct data nullability data to
> the JDBC driver. see: [https://github.com/pgjdbc/pgjdbc/issues/2079]
> An example:
> Table:
> ||t1.id||
> |2|
> |3|
> {code:java}
> CREATE TABLE t1 (id integer NOT NULL);
> INSERT INTO t1 VALUES (2), (3);
> {code}
> Query
> {code:java}
> WITH t2 AS (SELECT 1 AS id UNION SELECT 2)
> SELECT
> t1.id
> FROM t2
> LEFT JOIN t1 on t1.id = t2.id;{code}
> This returns the result set:
> ||id||
> |2|
> |null|
> The ResultSetMetaData indicates that the column is non-nullable (as t1.id is
> non-nullable) but there is null data in the result.
> The Arrow Vector that is present after the result set is consumed, looks like
> this:
> ||id||
> |2|
> |0|
> ResultSet.getInt(1) will return 0 when the source data is null, with an
> expectation that you check isNull.
> The data is incorrect and silently fails potentially leading to clients /
> consumers getting bad data.
>
> In other cases, such as UUID (mapped to UTF-8 vectors) the value will fail to
> load into arrow due to expecting null data and throwing a NPE when
> deserializing / converting to bytearrays.
>
> I was able to work around this problem by wrapping the postgres JDBC
> ResultSetMetadata and always forcing the nullability to nullable (or
> nullability unknown).
> Unfortunately I don't think there is a great way to solve this, but perhaps
> some way to configure / override the JDBCConsumer creation would allow for
> users of this library to override this behavior, however the silent failure
> and incorrect data might lead to users not noticing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)