Yicong-Huang opened a new issue, #4762:
URL: https://github.com/apache/texera/issues/4762

   ### What happened?
   
   
`common/workflow-core/src/main/scala/org/apache/texera/amber/util/ArrowUtils.scala::fromAttributeType`
 maps `STRING`, `LARGE_BINARY`, and `ANY` all to `ArrowType.Utf8.INSTANCE`. 
`LARGE_BINARY` is recovered via field metadata (`texera_type=LARGE_BINARY`) by 
`toTexeraSchema`, but `ANY` carries no metadata, so a schema round-trip 
(`toTexeraSchema(fromTexeraSchema(schema))`) silently turns every `ANY` 
attribute into `STRING`. The cross-language schema bridge therefore loses the 
`ANY` distinction entirely.
   
   ### How to reproduce?
   
   ```scala
   import org.apache.texera.amber.core.tuple.{Attribute, AttributeType, Schema}
   import org.apache.texera.amber.util.ArrowUtils
   
   val original = Schema(List(new Attribute("v", AttributeType.ANY)))
   val recovered = 
ArrowUtils.toTexeraSchema(ArrowUtils.fromTexeraSchema(original))
   // recovered.getAttributes.head.getType == AttributeType.STRING (information 
lost)
   ```
   
   ### Version
   
   1.1.0-incubating (Pre-release/Master)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to