cdmikechen created HUDI-481:
-------------------------------

             Summary: Support SQL-like method
                 Key: HUDI-481
                 URL: https://issues.apache.org/jira/browse/HUDI-481
             Project: Apache Hudi (incubating)
          Issue Type: Improvement
          Components: CLI
            Reporter: cdmikechen


As we know, Hudi use spark datasource api to upsert data. For example, if we 
want to update a data, we need to get the old row's data first, and use upsert 
method to update this row.
But there's another situation where someone just wants to update one column of 
data. If we use a sql to describe, it is {{update table set col1 = X where col2 
= Y}}. This is something hudi cannot deal with directly at present, we can only 
get all the data involved as a dataset first and then merge it.
So I think maybe we can create a new subproject to process the batch data in an 
sql-like method. For example.

 {code}
val hudiTable = new HudiTable(path)
hudiTable.update.set("col1 = X").where("col2 = Y")
hudiTable.delete.where("col3 = Z")
hudiTable.commit
{code}

It may also extend the functionality and support jdbc-like RFC schemes: 
[https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller]

Hope every one can provide some suggestions to see if this plan is feasible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to