Jonathan Swenson created ARROW-16427:
----------------------------------------
Summary: [Java] jdbcToArrowVectors / sqlToArrowVectorIterator
fails to handle variable decimal precision / scale
Key: ARROW-16427
URL: https://issues.apache.org/jira/browse/ARROW-16427
Project: Apache Arrow
Issue Type: Bug
Reporter: Jonathan Swenson
Against postgres, running a simple SQL query that produces numeric types can
lead to a JDBC result set with BigDecimal values with variable decimal
precision/scale.
{code:java}
SELECT value FROM (
SELECT 1000000000000000.01 AS "value"
UNION SELECT 1000000000300.0000001
) a {code}
The postgres JDBC adapter produces a result set that looks like the following:
|| ||value||precision||scale||
|metadata|N/A|0|0|
|row 1|1000000000000000.01|18|2|
|row 2|1000000000300.0000001|20|7|
Even a result set that returns a single value may Numeric values with precision
/ scale that do not match the precision / scale in the ResultSetMetadata.
{code:java}
SELECT AVG(one) from (
SELECT 1000000000000000.01 as "one"
UNION select 1000000000300.0000001
) a {code}
produces a result set that looks like this
|| ||value||precision||scale||
|metadata|N/A|0|0|
|row 1|500500000000150.0050001|22|7|
When processing the result set using the simple jdbcToArrowVectors (or
sqlToArrowVectorIterator) this fails to set the values extracted from the
result set into the the DecimalVector
{code:java}
val calendar = JdbcToArrowUtils.getUtcCalendar()
val schema = JdbcToArrowUtils.jdbcToArrowSchema(rs.metaData, calendar)
val root = VectorSchemaRoot.create(schema, RootAllocator())
val vectors = JdbcToArrowUtils.jdbcToArrowVectors(rs, root, calendar) {code}
Error:
{code:java}
Exception in thread "main" java.lang.IndexOutOfBoundsException: index: 0,
length: 1 (expected: range(0, 0))
at org.apache.arrow.memory.ArrowBuf.checkIndexD(ArrowBuf.java:318)
at org.apache.arrow.memory.ArrowBuf.chk(ArrowBuf.java:305)
at org.apache.arrow.memory.ArrowBuf.getByte(ArrowBuf.java:507)
at org.apache.arrow.vector.BitVectorHelper.setBit(BitVectorHelper.java:85)
at org.apache.arrow.vector.DecimalVector.set(DecimalVector.java:354)
at
org.apache.arrow.adapter.jdbc.consumer.DecimalConsumer$NullableDecimalConsumer.consume(DecimalConsumer.java:61)
at
org.apache.arrow.adapter.jdbc.consumer.CompositeJdbcConsumer.consume(CompositeJdbcConsumer.java:46)
at
org.apache.arrow.adapter.jdbc.JdbcToArrowUtils.jdbcToArrowVectors(JdbcToArrowUtils.java:369)
at
org.apache.arrow.adapter.jdbc.JdbcToArrowUtils.jdbcToArrowVectors(JdbcToArrowUtils.java:321)
{code}
using `sqlToArrowVectorIterator` also fails with an error trying to set data
into the vector: (requires a little bit of trickery to force creation of the
package private configuration)
{code:java}
Exception in thread "main" java.lang.RuntimeException: Error occurred while
getting next schema root.
at
org.apache.arrow.adapter.jdbc.ArrowVectorIterator.next(ArrowVectorIterator.java:179)
at
com.acme.dataformat.ArrowResultSetProcessor.processResultSet(ArrowResultSetProcessor.kt:31)
at com.acme.AppKt.main(App.kt:54)
at com.acme.AppKt.main(App.kt)
Caused by: java.lang.RuntimeException: Error occurred while consuming data.
at
org.apache.arrow.adapter.jdbc.ArrowVectorIterator.consumeData(ArrowVectorIterator.java:121)
at
org.apache.arrow.adapter.jdbc.ArrowVectorIterator.load(ArrowVectorIterator.java:153)
at
org.apache.arrow.adapter.jdbc.ArrowVectorIterator.next(ArrowVectorIterator.java:175)
... 3 more
Caused by: java.lang.UnsupportedOperationException: BigDecimal scale must equal
that in the Arrow vector: 7 != 0
at
org.apache.arrow.vector.util.DecimalUtility.checkPrecisionAndScale(DecimalUtility.java:95)
at org.apache.arrow.vector.DecimalVector.set(DecimalVector.java:355)
at
org.apache.arrow.adapter.jdbc.consumer.DecimalConsumer$NullableDecimalConsumer.consume(DecimalConsumer.java:61)
at
org.apache.arrow.adapter.jdbc.consumer.CompositeJdbcConsumer.consume(CompositeJdbcConsumer.java:46)
at
org.apache.arrow.adapter.jdbc.ArrowVectorIterator.consumeData(ArrowVectorIterator.java:113)
... 5 more {code}
Is there a recommended course of action to represent a variable precision /
scale decimal vector? In any case it does not seem possible to convert JDBC
data that uses these numeric types when they come in this form.
It seems like Oracle's JDBC driver also returns metadata with a 0,0 precision /
scale and other dialects might do the same.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)