tustvold commented on issue #6736: URL: https://github.com/apache/arrow-rs/issues/6736#issuecomment-2517978026
> From arrow perspective, would that be a new DataType, or rather a convention of using DataType::Struct with two Binary fields? I don't know, I've not really been following the variant proposal close enough to weigh in here. However, my understanding is that shredding is one of the major motivators for this getting added to parquet, as without it you might as well just embed any record format, e.g. Avro. I therefore suspect most use-cases will be at least partially shredded, and the reader will need to handle this case. This is especially true given the variant_value is NULL when the data is shredded, as opposed to say duplicating the content (which would have its own issues TBC), and so we can't just ignore the shredded data. Unfortunately I can't see an obvious way to be able to represent this sort of semi-structured data within the arrow format without introducing a new DataType that is able to accommodate arrays having the same type, but different child layouts... TLDR I suspect actioning this will require arrow defining a way to represent semi-structured data... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
