tustvold commented on issue #6736:
URL: https://github.com/apache/arrow-rs/issues/6736#issuecomment-2517978026

   > From arrow perspective, would that be a new DataType, or rather a 
convention of using DataType::Struct with two Binary fields?
   
   I don't know, I've not really been following the variant proposal close 
enough to weigh in here. However, my understanding is that shredding is one of 
the major motivators for this getting added to parquet, as without it you might 
as well just embed any record format, e.g. Avro. I therefore suspect most 
use-cases will be at least partially shredded, and the reader will need to 
handle this case. This is especially true given the variant_value is NULL when 
the data is shredded, as opposed to say duplicating the content (which would 
have its own issues TBC), and so we can't just ignore the shredded data.
   
   Unfortunately I can't see an obvious way to be able to represent this sort 
of semi-structured data within the arrow format without introducing a new 
DataType that is able to accommodate arrays having the same type, but different 
child layouts...
   
   TLDR I suspect actioning this will require arrow defining a way to represent 
semi-structured data...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to