[jira] [Commented] (HUDI-481) Support SQL-like method

Vinoth Chandar (Jira) Tue, 20 Oct 2020 15:07:31 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217972#comment-17217972
 ]


Vinoth Chandar commented on HUDI-481:
-------------------------------------

>a. If we use a sql to describe, it is {{update table set col1 = X where col2 = 
>Y}}. This is something hudi cannot deal with directly at present, we can only 
>get all the data involved as a dataset first and then merge it.

I don't think we can avoid getting the dataset first i.e read the older parquet 
file to merge the record. In fact, I would argue that Hudi uniquely let's you 
deal with a single column update scenario now, by allowing custom payloads to 
specify merging. i.e base file can contain the entire record and the log can 
just contain the updated col value and we will be able to merge this .

 

What we are missing is the SQL support for Merges, which we should build out 
under HUDI-1297 's scope. wdyt? 

> Support SQL-like method
> -----------------------
>
>                 Key: HUDI-481
>                 URL: https://issues.apache.org/jira/browse/HUDI-481
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: CLI
>            Reporter: cdmikechen
>            Priority: Minor
>
> As we know, Hudi use spark datasource api to upsert data. For example, if we 
> want to update a data, we need to get the old row's data first, and use 
> upsert method to update this row.
> But there's another situation where someone just wants to update one column 
> of data. If we use a sql to describe, it is {{update table set col1 = X where 
> col2 = Y}}. This is something hudi cannot deal with directly at present, we 
> can only get all the data involved as a dataset first and then merge it.
> So I think maybe we can create a new subproject to process the batch data in 
> an sql-like method. For example.
>  {code}
> val hudiTable = new HudiTable(path)
> hudiTable.update.set("col1 = X").where("col2 = Y")
> hudiTable.delete.where("col3 = Z")
> hudiTable.commit
> {code}
> It may also extend the functionality and support jdbc-like RFC schemes: 
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller]
> Hope every one can provide some suggestions to see if this plan is feasible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-481) Support SQL-like method

Reply via email to