paleolimbot commented on issue #1950:
URL: https://github.com/apache/arrow-adbc/issues/1950#issuecomment-2346722372
> I would recommend upcasting to the smallest integer type that can hold all
of the values:
> uint8 -> int16
> uint16 -> int32
> uint32 -> int64
This is definitely the safest way to go, and had already been baked in to
`PostgresType::FromSchema()`. In #2153 we updated the ingestion behaviour to
actually use `PostgresType::FromSchema()`, and so for a "create" or "append to
something created by adbc" this should now work:
``` r
library(adbcdrivermanager)
library(nanoarrow)
con <- adbc_database_init(
adbcpostgresql::adbcpostgresql(),
uri =
"postgresql://localhost:5432/postgres?user=postgres&password=password"
) |>
adbc_connection_init()
df <- tibble::tibble(
uint8_col = 246:255,
uint16_col = 65526:65535,
uint32_col = (.Machine$integer.max + 1):(.Machine$integer.max + 10)
)
array <- df |>
nanoarrow::as_nanoarrow_array(
schema = na_struct(
list(
uint8_col = na_uint8(),
uint16_col = na_uint16(),
uint32_col = na_uint32()
)
)
)
con |>
execute_adbc("DROP TABLE IF EXISTS adbc_test")
array |>
write_adbc(con, "adbc_test")
con |>
read_adbc("select * from adbc_test") |>
tibble::as_tibble()
#> # A tibble: 10 × 3
#> uint8_col uint16_col uint32_col
#> <int> <int> <dbl>
#> 1 246 65526 2147483648
#> 2 247 65527 2147483649
#> 3 248 65528 2147483650
#> 4 249 65529 2147483651
#> 5 250 65530 2147483652
#> 6 251 65531 2147483653
#> 7 252 65532 2147483654
#> 8 253 65533 2147483655
#> 9 254 65534 2147483656
#> 10 255 65535 2147483657
```
Unfortunately, the method we're using to efficiently insert (generate COPY
data) requires that the types match exactly, so this will fail to append to
Arrow data that happens to have an unsigned integer column to an existing table:
```r
con |>
execute_adbc("DROP TABLE IF EXISTS adbc_test")
con |>
execute_adbc("CREATE TABLE adbc_test (uint8_col int2, uint16_col int2,
uint32_col int4)")
array |>
write_adbc(con, "adbc_test", mode = "append")
#> Error in adbc_statement_execute_query(stmt): INVALID_ARGUMENT: [libpq]
Failed to execute COPY statement: PGRES_FATAL_ERROR ERROR: incorrect binary
data format
#> CONTEXT: COPY adbc_test, line 1, column uint16_col
```
> uint64 -> ???
For a fresh insert of Arrow data (i.e., when we are forced to generate a
CREATE TABLE statement), this should probably be inferred as `bigint` because
it preserves the "integerness", even though it will error at runtime if passed
very large values. The workaround here would be to issue your own `CREATE TABLE
(bigint_col BYTEA)`, and we will need to support generating COPY for various
postgres types outside the 1:1 mapping that is currently used by the COPY
writer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]