[GitHub] [hudi] bhasudha commented on pull request #4097: [WIP] - [HUDI-2806] - Docs for Transformer Utilities

GitBox Wed, 24 Nov 2021 15:01:35 -0800


bhasudha commented on pull request #4097:
URL: https://github.com/apache/hudi/pull/4097#issuecomment-978410027



   For flattening transformer no special configs are required. The class has to 
be set like this  ```--transformer-class  
org.apache.hudi.utilities.transform.FlatteningTransformer``` 
   
   It flattens the nested fields in the incoming records by prefixing 
innerfields with `outerfield` and `_` in a nested fashion. Currently flattening 
of arrays is not supported. And this flattened schema is plugged into 
sparkSession.sql("select " + flattenedSchema + " from table") command like how 
the SqlQueryBasedTransformer executes. Example of flattenedSchema could be 
something like  
   `age as intColumn,address as stringColumn,name.first as name_first,name.last 
as name_last, name.middle as name_middle
   ` where name is a nested field of StructType in the original source


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] bhasudha commented on pull request #4097: [WIP] - [HUDI-2806] - Docs for Transformer Utilities

Reply via email to