eest opened a new issue, #38616:
URL: https://github.com/apache/arrow/issues/38616

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Hello,
   
   When trying to write out a parquet file containing a uint16 column the file 
will be written without complaints but once I tried looking at it with 
[DuckDB](https://duckdb.org) it is unable to read it. The error looks like this:
   ```
   $ duckdb -c 'select * from test.parquet'
   Error: Invalid Input Error: Failed to cast value: Type UINT32 with value 
4294967295 can't be cast because the value is out of range for the destination 
type UINT16
   ```
   
   The file can be generated with this code:
   ```
   package main
   
   import (
        "log"
        "os"
   
        "github.com/apache/arrow/go/v14/arrow"
        "github.com/apache/arrow/go/v14/arrow/array"
        "github.com/apache/arrow/go/v14/arrow/memory"
        "github.com/apache/arrow/go/v14/parquet/pqarrow"
   )
   
   func main() {
        schema := arrow.NewSchema(
                []arrow.Field{{Name: "port", Type: arrow.PrimitiveTypes.Uint16, 
Nullable: true}},
                nil,
        )
   
        pool := memory.NewGoAllocator()
   
        rb := array.NewRecordBuilder(pool, schema)
        defer rb.Release()
   
        port := rb.Field(0).(*array.Uint16Builder)
        defer port.Release()
   
        port.Append(12)
   
        record := rb.NewRecord()
   
        outFile, err := os.Create("test.parquet")
        if err != nil {
                log.Fatalf("unable to open session file: %s", err)
        }
   
        parquetWriter, err := pqarrow.NewFileWriter(schema, outFile, nil, 
pqarrow.DefaultWriterProps())
        if err != nil {
                log.Fatalf("unable to create parquet writer: %s", err)
        }
   
        err = parquetWriter.Write(record)
        if err != nil {
                log.Fatalf("unable to write parquet file: %s", err)
        }
   
        err = parquetWriter.Close()
        if err != nil {
                log.Fatalf("unable to close parquet file: %s", err)
        }
   }
   ```
   
   As I was unsure where that large number came from (as can be seen the test 
code is appending the number `12`) I tried inspecting the file using the 
[parquet-cli](https://formulae.brew.sh/formula/parquet-cli) tool, with the 
following results:
   ```
   $ parquet pages test.parquet
   
   Column: port
   
--------------------------------------------------------------------------------
     page   type  enc  count   avg size   size       rows     nulls   min / max
     0-D    dict  _ _  1       4,00 B     4 B
     0-1    data  _ R  1       9,00 B     9 B                 0       
"4294967295" / "12"
   ```
   While `max` reflects the number actually appended to the column, the `min` 
value for some reason is that large number that DuckDB fails to cast to an 
uint16. The number is also visible when calling the `meta` command:
   ```
   $ parquet meta test.parquet
   
   File path:  test.parquet
   Created by: parquet-go version 14.0.0
   Properties: (none)
   Schema:
   message schema {
     optional int32 port (INTEGER(16,false));
   }
   
   
   Row group 0:  count: 1  62,00 B records  start: 4  total(compressed): 62 B 
total(uncompressed):62 B
   
--------------------------------------------------------------------------------
         type      encodings count     avg size   nulls   min / max
   port  INT32     _ _ R     1         62,00 B    0       "4294967295" / "12"
   ```
   
   It seems strange to me that `min` is not also `12` as a `count` of `1` 
should indicate there is only one value present.
   
   ### Component(s)
   
   Go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to