candiduslynx commented on issue #41284: URL: https://github.com/apache/arrow/issues/41284#issuecomment-2100987572
> If I'm reading this right, you're slicing the record int a slice of records of exactly 1 row each? Why? Tests requirement, it'll be removed soon (we need to sort rows in tests, so we slice to single-row records). > But I'm more confused by this `reverseTransformArray` function. `reverseTransformRecord` appears to loop through the columns and call `reverseTransformArray` with each column's type and the column itself. The schema passed to the `reverseTransformRecord` function doesn't necessarily match the schema in the record read. We have to convert some columns to other types to be better represented in parquet (maybe we should revisit this, actually). We have [similar code](https://github.com/cloudquery/cloudquery/blob/main/plugins/destination/duckdb/client/transform.go) in our `duckdb` plugin and we use the parquet formatting to put the data into the tables. Unfortunately, DuckDB doesn't support all of the types 1-to-1, so we are converting some values. That also means that to reconstruct the record read for tests we need to perform the reverse transformation. I'll revisit the code in `filetypes` package as there seems to be some discrepancy, but overall it is what it is. > > offset should also be used (as the passed in record/array may be sliced), but not for struct arrays (they are special & I don't know why). > > What do you mean "special"? The offset handling for struct arrays should work precisely the same as any other type. Can you elaborate on what the issue there is? https://github.com/cloudquery/filetypes/pull/279 I noticed that when working with sliced struct arrays the `arr.Data().Offset()` would return unusable info. That's because the underlying arrays are sliced, too, so constructing the struct array this way fails (you have to use 0). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org