westonpace commented on issue #34484: URL: https://github.com/apache/arrow/issues/34484#issuecomment-2107659233
> It feels to me that augmented fields should never leave the read rel. If the contents of the augmented fields are interesting for the rest of the plan then an expression could be used in the read relation to reference those fields (thus preserving them for future consumption). With that design the processing schema additionally includes the augmented fields but the output schema includes only the normal fields. I'm not entirely sure I follow. What is the "processing schema". Mentally I think of the augmented fields as fields that are always present in every file. In other words, if you create a file with one column X then you have 5 field (X plus the four augmented fields). The base schema of the file has 5 fields and you can pick which ones to include or not in the projection. I think the issue at hand here might be better described as the fact that the producer and consumer disagree about what fields are present in the file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
