[PR] [VL] Support read old ORC file without column names [incubator-gluten]

via GitHub Fri, 24 Oct 2025 03:58:44 -0700


ccat3z opened a new pull request, #8862:
URL: https://github.com/apache/incubator-gluten/pull/8862


   ## What changes were proposed in this pull request?
   
   An ORC file written by an old version has no field names in the physical 
schema. To read it, we must map table schema to file schema using indices.
   
   1. Pass `ScanTransformer#getDataColumns` as table schema to Velox.
   2. Enable k{Parquet,Orc}UseColumnNames in Velox to match spark default 
behavior, which always map table schema to physical file schema using name.
   
   This PR depends on https://github.com/facebookincubator/velox/pull/12489 
(old ORC files) and https://github.com/facebookincubator/velox/pull/12490 
(match index mapping behavior in spark).
   
   Fixed https://github.com/apache/incubator-gluten/issues/5638.
   
   ## How was this patch tested?
   
   Unit tests.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [VL] Support read old ORC file without column names [incubator-gluten]

Reply via email to