[
https://issues.apache.org/jira/browse/ARROW-17005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563896#comment-17563896
]
Jonathan Swenson commented on ARROW-17005:
------------------------------------------
It's possible to override arrow output type currently (using
setJdbcToArrowTypeConverter) but it seems like what you are describing would be
a generalization that would allow for writing a custom JDBCConsumer (or am I
misinterpreting "custom conversion for a custom type")
I think that would also generalize some of the work that [~toddfarmer] has done
to help with the BigDecimal issues I've run into:
https://issues.apache.org/jira/browse/ARROW-16600
https://issues.apache.org/jira/browse/ARROW-16427
The postgres driver has really been a bit of a pain for me 🤦🏼♂️
> [Java] Incorrect results from JDBC Adapter from Postgres of non-nullable
> column through left join
> -------------------------------------------------------------------------------------------------
>
> Key: ARROW-17005
> URL: https://issues.apache.org/jira/browse/ARROW-17005
> Project: Apache Arrow
> Issue Type: Wish
> Components: Java
> Reporter: Jonathan Swenson
> Priority: Major
>
> Unsure to consider this a bug or wish, but the JDBC to Arrow Adapter produces
> incorrect results when wrapping the postgres driver in certain cases.
> If you left join a non-nullable column, the column becomes nullable (if the
> join condition does not match any columns). However the postgres
> ResultSetMetaData lies to you and still indicates that the column is still
> non-nullable.
> When iterating through the data, results come back as null (isNull will
> return true).
> However, because of the way that the JDBCConsumer is created, it creates a
> non-nullable consumer and will not check the nullability of these results.
> Unfortunately, this results in incorrect data or errors depending on the data
> types returned.
> The postgres JDBC team has closed a ticket about this indicating that it
> would be impossible for them to return the correct data nullability data to
> the JDBC driver. see: [https://github.com/pgjdbc/pgjdbc/issues/2079]
> An example:
> Table:
> ||t1.id||
> |2|
> |3|
> {code:java}
> CREATE TABLE t1 (id integer NOT NULL);
> INSERT INTO t1 VALUES (2), (3);
> {code}
> Query
> {code:java}
> WITH t2 AS (SELECT 1 AS id UNION SELECT 2)
> SELECT
> t1.id
> FROM t2
> LEFT JOIN t1 on t1.id = t2.id;{code}
> This returns the result set:
> ||id||
> |2|
> |null|
> The ResultSetMetaData indicates that the column is non-nullable (as t1.id is
> non-nullable) but there is null data in the result.
> The Arrow Vector that is present after the result set is consumed, looks like
> this:
> ||id||
> |2|
> |0|
> ResultSet.getInt(1) will return 0 when the source data is null, with an
> expectation that you check isNull.
> The data is incorrect and silently fails potentially leading to clients /
> consumers getting bad data.
>
> In other cases, such as UUID (mapped to UTF-8 vectors) the value will fail to
> load into arrow due to expecting null data and throwing a NPE when
> deserializing / converting to bytearrays.
>
> I was able to work around this problem by wrapping the postgres JDBC
> ResultSetMetadata and always forcing the nullability to nullable (or
> nullability unknown).
> Unfortunately I don't think there is a great way to solve this, but perhaps
> some way to configure / override the JDBCConsumer creation would allow for
> users of this library to override this behavior, however the silent failure
> and incorrect data might lead to users not noticing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)