candiduslynx commented on issue #41284:
URL: https://github.com/apache/arrow/issues/41284#issuecomment-2100987572

   > If I'm reading this right, you're slicing the record int a slice of 
records of exactly 1 row each? Why?
   
   Tests requirement, it'll be removed soon (we need to sort rows in tests, so 
we slice to single-row records).
   
   > But I'm more confused by this `reverseTransformArray` function. 
`reverseTransformRecord` appears to loop through the columns and call 
`reverseTransformArray` with each column's type and the column itself. 
   
   The schema passed to the `reverseTransformRecord` function doesn't 
necessarily match the schema in the record read.
   We have to convert some columns to other types to be better represented in 
parquet (maybe we should revisit this, actually). We have [similar 
code](https://github.com/cloudquery/cloudquery/blob/main/plugins/destination/duckdb/client/transform.go)
 in our `duckdb` plugin and we use the parquet formatting to put the data into 
the tables. Unfortunately, DuckDB doesn't support all of the types 1-to-1, so 
we are converting some values. That also means that to reconstruct the record 
read for tests we need to perform the reverse transformation.
   
   I'll revisit the code in `filetypes` package as there seems to be some 
discrepancy, but overall it is what it is.
   
   > > offset should also be used (as the passed in record/array may be 
sliced), but not for struct arrays (they are special & I don't know why).
   > 
   > What do you mean "special"? The offset handling for struct arrays should 
work precisely the same as any other type. Can you elaborate on what the issue 
there is?
   
   https://github.com/cloudquery/filetypes/pull/279
   I noticed that when working with sliced struct arrays the 
`arr.Data().Offset()` would return unusable info. That's because the underlying 
arrays are sliced, too, so constructing the struct array this way fails (you 
have to use 0).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to