Navin Kumar created SPARK-44159:
-----------------------------------
Summary: Commands for writting (InsertIntoHadoopFsRelationCommand
and InsertIntoHiveTable) should log what they are doing
Key: SPARK-44159
URL: https://issues.apache.org/jira/browse/SPARK-44159
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.4.0
Reporter: Navin Kumar
Improvements from SPARK-41763 decoupled the execution of create table and data
writing commands in a CTAS (see SPARK-41713).
This means that while the code is cleaner with v1 write implementation limited
to InsertIntoHadoopFsRelationCommand and InsertIntoHiveTable, the execution of
these operations is less clear than it was before. Previously, the command was
present in the physical plan (see explain output below):
{{== Physical Plan ==}}
{{CommandResult <empty>}}
{{+- Execute CreateHiveTableAsSelectCommand [Database: default, TableName:
test_hive_text_table, InsertIntoHiveTable]}}
{{+- *(1) Scan ExistingRDD[...]}}
But in Spark 3.4.0, this output is:
{{== Physical Plan ==}}
{{CommandResult <empty>}}
{{+- Execute CreateHiveTableAsSelectCommand}}
{{+- CreateHiveTableAsSelectCommand [Database: default, TableName:
test_hive_text_table]}}
{{+- Project [...]}}
{{+- SubqueryAlias hive_input_table}}
{{+- View (`hive_input_table`, [...])}}
{{+- LogicalRDD [...], false}}
And the write command is now missing. This makes sense since execution is
decoupled, but since there is no log output from InsertIntoHiveTable, there is
no clear way to fully know that the command actually executed.
I would propose that either these commands should add a log message at the INFO
level that indicates how many rows were written into what table to make easier
for a user to know what has happened from the Spark logs. Another option maybe
to update the explain output in Spark 3.4 to handle this, but that might be
more difficult and make less sense since the operations are now decoupled.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]