leobiscassi opened a new pull request, #6391:
URL: https://github.com/apache/hudi/pull/6391

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   While I was doing a PoC using hudi and presto I came across with the 
following error:
   
   [16777224] Query failed (#20220727_185609_00434_4n5pr): The column 
my_column_name_here of table my_tablename_here is declared as type string, but 
the Parquet file 
(s3a://bucket/prefix/befb27ee-ee21-4791-95bb-d8aeb521aff9-0_15-22-5118_20220629223504.parquet)
 declares the column as type INT32 com.facebook.presto.spi.PrestoException: The 
column my_column_name_here of table my_tablename_here is declared as type 
string, but the Parquet file 
(s3a://bucket/prefix/befb27ee-ee21-4791-95bb-d8aeb521aff9-0_15-22-5118_20220629223504.parquet)
 declares the column as type INT32
   
   After change the config `hive.parquet.use-column-names=true` on 
`hive.properties` file the error stopped to happen, this is because sometimes 
the order of the fields in the parquet file it's not the same order used in the 
DDL command used to create the table, as far presto and trino uses by default 
the order of the fields, this can lead to type mismatch errors, using this 
config the set trino and presto to use column names instead of column order, 
fixing the issue.
   
   This card [HUDI-4522](https://issues.apache.org/jira/browse/HUDI-4522) a 
discussion on [this](https://github.com/apache/hudi/issues/6142) issue. This PR 
implements the request in the jira card.
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   Now the users will know in advance they need the do this config.
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   I think this doesn't represents a risk to hudi binaries.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [x] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to