[
https://issues.apache.org/jira/browse/HUDI-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17915800#comment-17915800
]
Davis Zhang commented on HUDI-8821:
-----------------------------------
Not an issue as for backward compatible writer to work we should disable MDT.
in spark 3.5 + hudi 0.15 to create the table
{code:java}
spark-sql (default)> %
➜ ~ ${SPARK_HOME}/bin/spark-sql --packages
org.apache.hudi:hudi-spark3.5-bundle_2.12:0.15.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' \
--conf 'spark.sql.catalogImplementation=in-memory'
25/01/21 10:28:55 WARN Utils: Your hostname, Daviss-MacBook-Pro.local resolves
to a loopback address: 127.0.0.1; using 192.168.1.109 instead (on interface en0)
25/01/21 10:28:55 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another
address
:: loading settings :: url =
jar:file:/Users/zhanyeha/spark-3.5.4-bin-hadoop3/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/zhanyeha/.ivy2/cache
The jars for the packages stored in: /Users/zhanyeha/.ivy2/jars
org.apache.hudi#hudi-spark3.5-bundle_2.12 added as a dependency
:: resolving dependencies ::
org.apache.spark#spark-submit-parent-8f88f567-eabd-4146-8d1b-67b71f332066;1.0
confs: [default]
found org.apache.hudi#hudi-spark3.5-bundle_2.12;0.15.0 in local-m2-cache
found org.apache.hive#hive-storage-api;2.8.1 in local-m2-cache
found org.slf4j#slf4j-api;1.7.36 in local-m2-cache
:: resolution report :: resolve 76ms :: artifacts dl 3ms
:: modules in use:
org.apache.hive#hive-storage-api;2.8.1 from local-m2-cache in [default]
org.apache.hudi#hudi-spark3.5-bundle_2.12;0.15.0 from local-m2-cache in
[default]
org.slf4j#slf4j-api;1.7.36 from local-m2-cache in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 3 | 0 | 0 | 0 || 3 | 0 |
---------------------------------------------------------------------
:: retrieving ::
org.apache.spark#spark-submit-parent-8f88f567-eabd-4146-8d1b-67b71f332066
confs: [default]
0 artifacts copied, 3 already retrieved (0kB/1ms)
25/01/21 10:28:56 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
25/01/21 10:28:56 WARN Utils: Service 'SparkUI' could not bind on port 4040.
Attempting port 4041.
25/01/21 10:28:56 WARN Utils: Service 'SparkUI' could not bind on port 4041.
Attempting port 4042.
Spark Web UI available at http://192.168.1.109:4042
Spark master: local[*], Application Id: local-1737484136833
spark-sql (default)>
> set hoodie.metadata.enable=false;
hoodie.metadata.enable false
Time taken: 0.428 seconds, Fetched 1 row(s)
spark-sql (default)> set
hoodie.datasource.write.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.datasource.write.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.009 seconds, Fetched 1 row(s)
spark-sql (default)> set
hoodie.compaction.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;set
hoodie.write.table.version=6;
hoodie.compaction.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.008 seconds, Fetched 1 row(s)
hoodie.write.table.version 6
Time taken: 0.007 seconds, Fetched 1 row(s)
spark-sql (default)> set hoodie.compact.inline='true';
hoodie.compact.inline 'true'
Time taken: 0.006 seconds, Fetched 1 row(s)
spark-sql (default)> set hoodie.compact.inline.max.delta.commits=2;
hoodie.compact.inline.max.delta.commits 2
Time taken: 0.006 seconds, Fetched 1 row(s)
spark-sql (default)>
> CREATE TABLE lliangyu_table_mor (event_id INT,
> event_date STRING,
> event_name STRING,
> event_ts STRING,
> event_type STRING
> ) USING hudi
> OPTIONS(
> type = 'mor',
> primaryKey = 'event_id,event_date',
> preCombileField = 'event_ts',
> hoodie.write.table.version = 6,
> hoodie.compact.inline = 'true',
> hoodie.compact.inline.max.delta.commits = 2,
>
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.record.merge.strategy.id='eeb8d96f-b1e4-49fd-bbf8-28ac514178e5'
> )
> PARTITIONED BY (event_type)
> LOCATION
'file:///tmp/lakes/observed-default/dd/lliangyu_table_mor';
25/01/21 10:29:19 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR,
please set it as the dir of hudi-defaults.conf
25/01/21 10:29:19 WARN DFSPropertiesConfiguration: Properties file
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
25/01/21 10:29:19 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
file:/tmp/lakes/observed-default/dd/lliangyu_table_mor
Time taken: 0.283 seconds
spark-sql (default)> {code}
in spark 3.5 + 1.0 do the following
{code:java}
Last login: Sun Jan 19 12:59:52 on ttys023
➜ ~ ${SPARK_HOME}/bin/spark-sql \
--jars
/Users/zhanyeha/hudiBuilds/hudi-spark3.5-bundle_2.12/baf141abbd6da022c66fa518588e34452a6902b4/hudi-spark3.5-bundle_2.12-1.1.0-SNAPSHOT.jar
\
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
--conf 'spark.sql.catalogImplementation=in-memory' \
--conf 'spark.executor.heartbeat.maxFailures=999999999' \
--conf spark.sql.defaultCatalog=spark_catalog
25/01/21 10:27:27 WARN Utils: Your hostname, Daviss-MacBook-Pro.local resolves
to a loopback address: 127.0.0.1; using 192.168.1.109 instead (on interface en0)
25/01/21 10:27:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another
address
25/01/21 10:27:27 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
25/01/21 10:27:28 WARN Utils: Service 'SparkUI' could not bind on port 4040.
Attempting port 4041.
Spark Web UI available at http://192.168.1.109:4041
Spark master: local[*], Application Id: local-1737484048566
spark-sql (default)>
> set hoodie.metadata.enable=false;
25/01/21 10:29:34 WARN DFSPropertiesConfiguration: Properties file
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
25/01/21 10:29:34 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR,
please set it as the dir of hudi-defaults.conf
hoodie.metadata.enable false
Time taken: 0.424 seconds, Fetched 1 row(s)
spark-sql (default)> set
hoodie.datasource.write.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.datasource.write.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.009 seconds, Fetched 1 row(s)
spark-sql (default)> set
hoodie.compaction.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;set
hoodie.write.table.version=6;
hoodie.compaction.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.007 seconds, Fetched 1 row(s)
hoodie.write.table.version 6
Time taken: 0.007 seconds, Fetched 1 row(s)
spark-sql (default)> set hoodie.compact.inline='true';
hoodie.compact.inline 'true'
Time taken: 0.007 seconds, Fetched 1 row(s)
spark-sql (default)> set hoodie.compact.inline.max.delta.commits=2;
hoodie.compact.inline.max.delta.commits 2
Time taken: 0.006 seconds, Fetched 1 row(s)
spark-sql (default)> CREATE TABLE lliangyu_table_mor (
> event_id INT,
> event_date STRING,
> event_name STRING,
> event_ts STRING,
> event_type STRING
> ) USING hudi
> OPTIONS(
> type = 'mor',
> primaryKey = 'event_id,event_date',
> preCombileField = 'event_ts',
> hoodie.write.table.version = 6,
> hoodie.compact.inline = 'true',
> hoodie.compact.inline.max.delta.commits = 2,
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.record.merge.strategy.id='eeb8d96f-b1e4-49fd-bbf8-28ac514178e5'
> )
> PARTITIONED BY (event_type)
> LOCATION
'file:///tmp/lakes/observed-default/dd/lliangyu_table_mor';
25/01/21 10:29:35 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
file:/tmp/lakes/observed-default/dd/lliangyu_table_mor
25/01/21 10:29:35 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.table.initial.version will
be ignored.
25/01/21 10:29:35 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.table.keygenerator.type
will be ignored.
25/01/21 10:29:35 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.record.merge.mode will be
ignored.
Time taken: 0.882 seconds
spark-sql (default)> INSERT INTO lliangyu_table_mor
> SELECT
> 101 as event_id,
> '2015-01-01' as event_date,
> 'event_name_546' as event_name,
> '2015-01-01T12:14:58.597216Z' as event_ts,
> 'type2' as event_type;
25/01/21 10:30:02 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
file:/tmp/lakes/observed-default/dd/lliangyu_table_mor
25/01/21 10:30:02 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
file:/tmp/lakes/observed-default/dd/lliangyu_table_mor
25/01/21 10:30:02 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet
fully supported by the writer. Please expect some unexpected behavior, until
its fully implemented.
Time taken: 1.646 seconds
spark-sql (default)> select * from lliangyu_table_mor;
20250121103002713 20250121103002713_0_0
event_id:101,event_date:2015-01-01 event_type=type2
996ac395-88bf-45a3-885c-4480eefbbde9-0_0-5-5_20250121103002713.parquet
102015-01-01 event_name_546 2015-01-01T12:14:58.597216Z type2
Time taken: 0.41 seconds, Fetched 1 row(s)
spark-sql (default)> INSERT INTO lliangyu_table_mor VALUES (100, '2015-01-01',
'event_name_900', '2015-01-01T13:51:39.340396Z', 'type1');
25/01/21 10:30:54 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet
fully supported by the writer. Please expect some unexpected behavior, until
its fully implemented.
25/01/21 10:30:54 WARN HoodieTableFileSystemView: Partition: event_type=type1
is not available in store
Time taken: 0.546 seconds
spark-sql (default)> select * from lliangyu_table_mor;
20250121103054462 20250121103054462_0_0
event_id:100,event_date:2015-01-01 event_type=type1
87902562-6f99-4f09-bab8-97f4fa6becc7-0_0-17-18_20250121103054462.parquet
100 2015-01-01 event_name_900 2015-01-01T13:51:39.340396Z type1
20250121103002713 20250121103002713_0_0
event_id:101,event_date:2015-01-01 event_type=type2
996ac395-88bf-45a3-885c-4480eefbbde9-0_0-5-5_20250121103002713.parquet
102015-01-01 event_name_546 2015-01-01T12:14:58.597216Z type2
Time taken: 0.319 seconds, Fetched 2 row(s)
spark-sql (default)> DELETE FROM default.lliangyu_table_mor WHERE event_type =
'type2';
25/01/21 10:31:08 WARN SparkStringUtils: Truncated the string representation of
a plan since it was too large. This behavior can be adjusted by setting
'spark.sql.debug.maxToStringFields'.
25/01/21 10:31:08 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet
fully supported by the writer. Please expect some unexpected behavior, until
its fully implemented.
# WARNING: Unable to get Instrumentation. Dynamic Attach failed. You may add
this JAR as -javaagent manually, or supply -Djdk.attach.allowAttachSelf
# WARNING: Unable to attach Serviceability Agent. Unable to attach even with
module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException:
Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed.]
Time taken: 0.814 seconds
spark-sql (default)> select * from lliangyu_table_mor;
20250121103054462 20250121103054462_0_0
event_id:100,event_date:2015-01-01 event_type=type1
87902562-6f99-4f09-bab8-97f4fa6becc7-0_0-17-18_20250121103054462.parquet
100 2015-01-01 event_name_900 2015-01-01T13:51:39.340396Z type1
Time taken: 0.236 seconds, Fetched 1 row(s)
spark-sql (default)> {code}
> Hudi 1.0 Spark SQL unexpected delete behaviors with backward writer
> -------------------------------------------------------------------
>
> Key: HUDI-8821
> URL: https://issues.apache.org/jira/browse/HUDI-8821
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Leon Lin
> Assignee: Davis Zhang
> Priority: Blocker
> Fix For: 1.0.1
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> Reproduction Steps:
>
> {code:java}
> // 1. Create an empty table with Hudi 0.14 + Spark 3.5.0
> spark.sql(
> """
> |CREATE TABLE lliangyu_table_mor (
> | event_id INT,
> | event_date STRING,
> | event_name STRING,
> | event_ts STRING,
> | event_type STRING
> |) USING hudi
> | OPTIONS(
> | type = 'mor',
> | primaryKey = 'event_id,event_date',
> | preCombileField = 'event_ts',
> | hoodie.write.table.version = 6,
> | hoodie.compact.inline = 'true',
> | hoodie.compact.inline.max.delta.commits = 2
> |)
> |PARTITIONED BY (event_type)
> |LOCATION
> 's3://lliangyu-580974493829-us-west-2/warehouse/hudi/lliangyu_table_mor';
> """.stripMargin){code}
> {code:java}
> // 2. Insert some rows using Spark 3.5.3 / Hudi 1.0 Backward writer
> spark.sql("set hoodie.write.table.version=6")
> spark.sql("set hoodie.compact.inline='true'")
> spark.sql("set hoodie.compact.inline.max.delta.commits=2")
> val insertStatements = Seq(
> "INSERT INTO lliangyu_table_mor VALUES (100, '2015-01-01', 'event_name_900',
> '2015-01-01T13:51:39.340396Z', 'type1');",
> "INSERT INTO lliangyu_table_mor VALUES (101, '2015-01-01', 'event_name_546',
> '2015-01-01T12:14:58.597216Z', 'type2');",
> "INSERT INTO lliangyu_table_mor VALUES (102, '2015-01-01', 'event_name_345',
> '2015-01-01T13:51:40.417052Z', 'type3');",
> "INSERT INTO lliangyu_table_mor VALUES (103, '2015-01-01', 'event_name_234',
> '2015-01-01T13:51:40.519832Z', 'type4');",
> "INSERT INTO lliangyu_table_mor VALUES (104, '2015-01-01', 'event_name_123',
> '2015-01-01T12:15:00.512679Z', 'type1');",
> "INSERT INTO lliangyu_table_mor VALUES (105, '2015-01-01', 'event_name_678',
> '2015-01-01T13:51:42.248818Z', 'type2');",
> "INSERT INTO lliangyu_table_mor VALUES (106, '2015-01-01', 'event_name_890',
> '2015-01-01T13:51:44.735360Z', 'type3');",
> "INSERT INTO lliangyu_table_mor VALUES (107, '2015-01-01', 'event_name_944',
> '2015-01-01T13:51:45.019544Z', 'type4');",
> "INSERT INTO lliangyu_table_mor VALUES (108, '2015-01-01', 'event_name_456',
> '2015-01-01T13:51:45.208007Z', 'type1');",
> "INSERT INTO lliangyu_table_mor VALUES (109, '2015-01-01', 'event_name_567',
> '2015-01-01T13:51:45.369689Z', 'type2');",
> "INSERT INTO lliangyu_table_mor VALUES (110, '2015-01-01', 'event_name_789',
> '2015-01-01T12:15:05.664947Z', 'type3');"
> )
> insertStatements.foreach { query => spark.sql(query) }
> // DUE TO issue with https://issues.apache.org/jira/browse/HUDI-8820
> // You will find some rows inserted using backward writer does not appear in
> selection.{code}
>
> {code:java}
> // 3. Delete some rows using Spark 3.5.3 / Hudi 1.0 Backward writer
> // Run deletes on rows that could be retrieved from selection
> spark.sql("DELETE FROM default.lliangyu_table_mor WHERE event_type =
> 'type2'").show(false);
> // Run select again returns incorrect results.
> +-------------------+---------------------+----------------------------------+----------------------+---------------------------------------------------------------------------+--------+----------+--------------+---------------------------+----------+
> |_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key
> |_hoodie_partition_path|_hoodie_file_name |event_id|event_date|event_name
> |event_ts |event_type|
> +-------------------+---------------------+----------------------------------+----------------------+---------------------------------------------------------------------------+--------+----------+--------------+---------------------------+----------+
> |20250103202501108
> |20250103202501108_0_0|event_id:108,event_date:2015-01-01|event_type=type1
> |b935d179-56b3-4f81-81e4-8bb0cf97c873-0_0-131-4218_20250103202501108.parquet|108
> |2015-01-01|event_name_456|2015-01-01T13:51:45.208007Z|type1 |
> +-------------------+---------------------+----------------------------------+----------------------+---------------------------------------------------------------------------+--------+----------+--------------+---------------------------+----------+
> {code}
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)