[ 
https://issues.apache.org/jira/browse/HUDI-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kazdy updated HUDI-5243:
------------------------
    Description: 
Currently when running spark sql DML, in order to check how many rows were 
affected, users need to get to the commit stats using hudi cli or stored 
procedure.

We can improve user experience by returning num_affected_rows after INSERT INTO 
command, so that spark sql users can easily see how many rows were inserted 
without the need to go to the commits itself.

num_affected_rows can be extracted in writer itself form commitMetadata

Example:
{code:java}
spark.sql("""
create table test_mor (id int, name string) 
using hudi 
tblproperties (primaryKey = 'id', type='mor');
""")

spark.sql(
"""
INSERT INTO test_mor
VALUES 
(1, "a"),
(2, "b"),
(3, "c"),
(4, "d"),
(5, "e"),
(6, "f"),
(7, "g")
""").show()

returns:
+-----------------+
|num_affected_rows|
+-----------------+
|                7|
+-----------------+
{code}

  was:
Currently when running spark sql DML, in order to check how many rows were 
affected, users need to get to the commit stats using hudi cli or stored 
procedure.

We can improve user experience by returning num_affected_rows after INSERT INTO 
command, so that sql users can easily see how many rows were inserted without 
the need to go to the commits itself.

num_affected_rows can be extracted in writer itself form commitMetadata

Example:
{code:java}
spark.sql("""
create table test_mor (id int, name string) 
using hudi 
tblproperties (primaryKey = 'id', type='mor');
""")

spark.sql(
"""
INSERT INTO test_mor
VALUES 
(1, "a"),
(2, "b"),
(3, "c"),
(4, "d"),
(5, "e"),
(6, "f"),
(7, "g")
""").show()

returns:
+-----------------+
|num_affected_rows|
+-----------------+
|                7|
+-----------------+
{code}


> Return num_affected_rows from sql INSERT statement
> --------------------------------------------------
>
>                 Key: HUDI-5243
>                 URL: https://issues.apache.org/jira/browse/HUDI-5243
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: spark-sql
>            Reporter: kazdy
>            Assignee: kazdy
>            Priority: Minor
>             Fix For: 0.13.0
>
>
> Currently when running spark sql DML, in order to check how many rows were 
> affected, users need to get to the commit stats using hudi cli or stored 
> procedure.
> We can improve user experience by returning num_affected_rows after INSERT 
> INTO command, so that spark sql users can easily see how many rows were 
> inserted without the need to go to the commits itself.
> num_affected_rows can be extracted in writer itself form commitMetadata
> Example:
> {code:java}
> spark.sql("""
> create table test_mor (id int, name string) 
> using hudi 
> tblproperties (primaryKey = 'id', type='mor');
> """)
> spark.sql(
> """
> INSERT INTO test_mor
> VALUES 
> (1, "a"),
> (2, "b"),
> (3, "c"),
> (4, "d"),
> (5, "e"),
> (6, "f"),
> (7, "g")
> """).show()
> returns:
> +-----------------+
> |num_affected_rows|
> +-----------------+
> |                7|
> +-----------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to