[ 
https://issues.apache.org/jira/browse/HUDI-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019249#comment-17019249
 ] 

Vinoth Chandar commented on HUDI-538:
-------------------------------------

>Do you mean the logic in the {{DeltaSync#readFromSource}}? A little bit more 
>specific, do you mean {{KeyGenerator}}?

sort of. We have logic there that constructs a `HoodieRecord` from a Spark 
`Row` or `GenericRecord`. I am saying we should push this further into the 
stack and do this lazily at write/index time as needed.. Alternative is to work 
with a Spark `DataSet<HoodieRecord>` or Flink `DataStream<HoodieRecord>` 
similar to JavaRDD<HoodieRecord>  now... Atleast for Spark,  not sure if any 
one uses anything other than `Row` with DataSet. 

> Restructuring hudi client module for multi engine support
> ---------------------------------------------------------
>
>                 Key: HUDI-538
>                 URL: https://issues.apache.org/jira/browse/HUDI-538
>             Project: Apache Hudi (incubating)
>          Issue Type: Wish
>          Components: Code Cleanup
>            Reporter: vinoyang
>            Priority: Major
>
> Hudi is currently tightly coupled with the Spark framework. It caused the 
> integration with other computing engine more difficult. We plan to decouple 
> it with Spark. This umbrella issue used to track this work.
> Some thoughts wrote here: 
> https://docs.google.com/document/d/1Q9w_4K6xzGbUrtTS0gAlzNYOmRXjzNUdbbe0q59PX9w/edit?usp=sharing
> The feature branch is {{restructure-hudi-client}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to