[ 
https://issues.apache.org/jira/browse/HUDI-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17915800#comment-17915800
 ] 

Davis Zhang commented on HUDI-8821:
-----------------------------------

Not an issue as for backward compatible writer to work we should disable MDT.

 

in spark 3.5 + hudi 0.15 to create the table
{code:java}
spark-sql (default)> %                                                          
                                                                                
                           ➜  ~ ${SPARK_HOME}/bin/spark-sql --packages 
org.apache.hudi:hudi-spark3.5-bundle_2.12:0.15.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--conf 
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
 \
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' \
--conf 'spark.sql.catalogImplementation=in-memory' 
25/01/21 10:28:55 WARN Utils: Your hostname, Daviss-MacBook-Pro.local resolves 
to a loopback address: 127.0.0.1; using 192.168.1.109 instead (on interface en0)
25/01/21 10:28:55 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address
:: loading settings :: url = 
jar:file:/Users/zhanyeha/spark-3.5.4-bin-hadoop3/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/zhanyeha/.ivy2/cache
The jars for the packages stored in: /Users/zhanyeha/.ivy2/jars
org.apache.hudi#hudi-spark3.5-bundle_2.12 added as a dependency
:: resolving dependencies :: 
org.apache.spark#spark-submit-parent-8f88f567-eabd-4146-8d1b-67b71f332066;1.0
        confs: [default]
        found org.apache.hudi#hudi-spark3.5-bundle_2.12;0.15.0 in local-m2-cache
        found org.apache.hive#hive-storage-api;2.8.1 in local-m2-cache
        found org.slf4j#slf4j-api;1.7.36 in local-m2-cache
:: resolution report :: resolve 76ms :: artifacts dl 3ms
        :: modules in use:
        org.apache.hive#hive-storage-api;2.8.1 from local-m2-cache in [default]
        org.apache.hudi#hudi-spark3.5-bundle_2.12;0.15.0 from local-m2-cache in 
[default]
        org.slf4j#slf4j-api;1.7.36 from local-m2-cache in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
        ---------------------------------------------------------------------
:: retrieving :: 
org.apache.spark#spark-submit-parent-8f88f567-eabd-4146-8d1b-67b71f332066
        confs: [default]
        0 artifacts copied, 3 already retrieved (0kB/1ms)
25/01/21 10:28:56 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
25/01/21 10:28:56 WARN Utils: Service 'SparkUI' could not bind on port 4040. 
Attempting port 4041.
25/01/21 10:28:56 WARN Utils: Service 'SparkUI' could not bind on port 4041. 
Attempting port 4042.
Spark Web UI available at http://192.168.1.109:4042
Spark master: local[*], Application Id: local-1737484136833
spark-sql (default)> 
                   > set hoodie.metadata.enable=false;
hoodie.metadata.enable  false
Time taken: 0.428 seconds, Fetched 1 row(s)
spark-sql (default)> set 
hoodie.datasource.write.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.datasource.write.payload.class   
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.009 seconds, Fetched 1 row(s)
spark-sql (default)> set 
hoodie.compaction.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;set
 hoodie.write.table.version=6;
hoodie.compaction.payload.class 
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.008 seconds, Fetched 1 row(s)
hoodie.write.table.version      6
Time taken: 0.007 seconds, Fetched 1 row(s)
spark-sql (default)> set hoodie.compact.inline='true';
hoodie.compact.inline   'true'
Time taken: 0.006 seconds, Fetched 1 row(s)
spark-sql (default)> set hoodie.compact.inline.max.delta.commits=2;
hoodie.compact.inline.max.delta.commits 2
Time taken: 0.006 seconds, Fetched 1 row(s)
spark-sql (default)> 
                   > CREATE TABLE lliangyu_table_mor (event_id INT,
                   >  event_date STRING,
                   >  event_name STRING,
                   >  event_ts STRING,
                   >  event_type STRING
                   > ) USING hudi
                   >  OPTIONS(
                   >  type = 'mor',
                   >  primaryKey = 'event_id,event_date',
                   >  preCombileField = 'event_ts',
                   >  hoodie.write.table.version = 6,
                   >  hoodie.compact.inline = 'true',
                   >  hoodie.compact.inline.max.delta.commits = 2,
                   >  
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
                   >  
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
                   >  
hoodie.record.merge.strategy.id='eeb8d96f-b1e4-49fd-bbf8-28ac514178e5'
                   > )
                   > PARTITIONED BY (event_type)
                   > LOCATION 
'file:///tmp/lakes/observed-default/dd/lliangyu_table_mor';
25/01/21 10:29:19 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, 
please set it as the dir of hudi-defaults.conf
25/01/21 10:29:19 WARN DFSPropertiesConfiguration: Properties file 
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
25/01/21 10:29:19 WARN TableSchemaResolver: Could not find any data file 
written for commit, so could not get schema for table 
file:/tmp/lakes/observed-default/dd/lliangyu_table_mor
Time taken: 0.283 seconds
spark-sql (default)>  {code}
 

in spark 3.5 + 1.0 do the following
{code:java}
Last login: Sun Jan 19 12:59:52 on ttys023
➜  ~ ${SPARK_HOME}/bin/spark-sql \
    --jars 
/Users/zhanyeha/hudiBuilds/hudi-spark3.5-bundle_2.12/baf141abbd6da022c66fa518588e34452a6902b4/hudi-spark3.5-bundle_2.12-1.1.0-SNAPSHOT.jar
 \
    --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
    --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
    --conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' \
    --conf 
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
 \
    --conf 'spark.sql.catalogImplementation=in-memory' \
    --conf 'spark.executor.heartbeat.maxFailures=999999999' \
    --conf spark.sql.defaultCatalog=spark_catalog
25/01/21 10:27:27 WARN Utils: Your hostname, Daviss-MacBook-Pro.local resolves 
to a loopback address: 127.0.0.1; using 192.168.1.109 instead (on interface en0)
25/01/21 10:27:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address
25/01/21 10:27:27 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
25/01/21 10:27:28 WARN Utils: Service 'SparkUI' could not bind on port 4040. 
Attempting port 4041.
Spark Web UI available at http://192.168.1.109:4041
Spark master: local[*], Application Id: local-1737484048566
spark-sql (default)> 
                   > set hoodie.metadata.enable=false;
25/01/21 10:29:34 WARN DFSPropertiesConfiguration: Properties file 
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
25/01/21 10:29:34 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, 
please set it as the dir of hudi-defaults.conf
hoodie.metadata.enable  false
Time taken: 0.424 seconds, Fetched 1 row(s)
spark-sql (default)> set 
hoodie.datasource.write.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.datasource.write.payload.class   
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.009 seconds, Fetched 1 row(s)
spark-sql (default)> set 
hoodie.compaction.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;set
 hoodie.write.table.version=6;
hoodie.compaction.payload.class 
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.007 seconds, Fetched 1 row(s)
hoodie.write.table.version      6
Time taken: 0.007 seconds, Fetched 1 row(s)
spark-sql (default)> set hoodie.compact.inline='true';
hoodie.compact.inline   'true'
Time taken: 0.007 seconds, Fetched 1 row(s)
spark-sql (default)> set hoodie.compact.inline.max.delta.commits=2;
hoodie.compact.inline.max.delta.commits 2
Time taken: 0.006 seconds, Fetched 1 row(s)
spark-sql (default)> CREATE TABLE lliangyu_table_mor (
                   >  event_id INT,
                   >  event_date STRING,
                   >  event_name STRING,
                   >  event_ts STRING,
                   >  event_type STRING
                   > ) USING hudi
                   >  OPTIONS(
                   >  type = 'mor',
                   >  primaryKey = 'event_id,event_date',
                   >  preCombileField = 'event_ts',
                   >  hoodie.write.table.version = 6,
                   >  hoodie.compact.inline = 'true',
                   >  hoodie.compact.inline.max.delta.commits = 2, 
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
                   > 
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
                   >  
hoodie.record.merge.strategy.id='eeb8d96f-b1e4-49fd-bbf8-28ac514178e5'
                   > )
                   > PARTITIONED BY (event_type)
                   > LOCATION 
'file:///tmp/lakes/observed-default/dd/lliangyu_table_mor';
25/01/21 10:29:35 WARN TableSchemaResolver: Could not find any data file 
written for commit, so could not get schema for table 
file:/tmp/lakes/observed-default/dd/lliangyu_table_mor
25/01/21 10:29:35 WARN HoodieTableConfig: Table version SIX is lower than or 
equal to config's first version EIGHT. Config hoodie.table.initial.version will 
be ignored.
25/01/21 10:29:35 WARN HoodieTableConfig: Table version SIX is lower than or 
equal to config's first version EIGHT. Config hoodie.table.keygenerator.type 
will be ignored.
25/01/21 10:29:35 WARN HoodieTableConfig: Table version SIX is lower than or 
equal to config's first version EIGHT. Config hoodie.record.merge.mode will be 
ignored.
Time taken: 0.882 seconds
spark-sql (default)> INSERT INTO lliangyu_table_mor
                   > SELECT 
                   >     101 as event_id,
                   >     '2015-01-01' as event_date,
                   >     'event_name_546' as event_name,
                   >     '2015-01-01T12:14:58.597216Z' as event_ts,
                   >     'type2' as event_type;
25/01/21 10:30:02 WARN TableSchemaResolver: Could not find any data file 
written for commit, so could not get schema for table 
file:/tmp/lakes/observed-default/dd/lliangyu_table_mor
25/01/21 10:30:02 WARN TableSchemaResolver: Could not find any data file 
written for commit, so could not get schema for table 
file:/tmp/lakes/observed-default/dd/lliangyu_table_mor
25/01/21 10:30:02 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet 
fully supported by the writer. Please expect some unexpected behavior, until 
its fully implemented.
Time taken: 1.646 seconds
spark-sql (default)> select * from lliangyu_table_mor;
20250121103002713       20250121103002713_0_0   
event_id:101,event_date:2015-01-01      event_type=type2        
996ac395-88bf-45a3-885c-4480eefbbde9-0_0-5-5_20250121103002713.parquet  
102015-01-01    event_name_546  2015-01-01T12:14:58.597216Z     type2
Time taken: 0.41 seconds, Fetched 1 row(s)
spark-sql (default)> INSERT INTO lliangyu_table_mor VALUES (100, '2015-01-01', 
'event_name_900', '2015-01-01T13:51:39.340396Z', 'type1');
25/01/21 10:30:54 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet 
fully supported by the writer. Please expect some unexpected behavior, until 
its fully implemented.
25/01/21 10:30:54 WARN HoodieTableFileSystemView: Partition: event_type=type1 
is not available in store
Time taken: 0.546 seconds
spark-sql (default)> select * from lliangyu_table_mor;
20250121103054462       20250121103054462_0_0   
event_id:100,event_date:2015-01-01      event_type=type1        
87902562-6f99-4f09-bab8-97f4fa6becc7-0_0-17-18_20250121103054462.parquet        
100     2015-01-01      event_name_900  2015-01-01T13:51:39.340396Z     type1
20250121103002713       20250121103002713_0_0   
event_id:101,event_date:2015-01-01      event_type=type2        
996ac395-88bf-45a3-885c-4480eefbbde9-0_0-5-5_20250121103002713.parquet  
102015-01-01    event_name_546  2015-01-01T12:14:58.597216Z     type2
Time taken: 0.319 seconds, Fetched 2 row(s)
spark-sql (default)> DELETE FROM default.lliangyu_table_mor WHERE event_type = 
'type2';
25/01/21 10:31:08 WARN SparkStringUtils: Truncated the string representation of 
a plan since it was too large. This behavior can be adjusted by setting 
'spark.sql.debug.maxToStringFields'.
25/01/21 10:31:08 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet 
fully supported by the writer. Please expect some unexpected behavior, until 
its fully implemented.
# WARNING: Unable to get Instrumentation. Dynamic Attach failed. You may add 
this JAR as -javaagent manually, or supply -Djdk.attach.allowAttachSelf
# WARNING: Unable to attach Serviceability Agent. Unable to attach even with 
module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: 
Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense 
failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense 
failed.]
Time taken: 0.814 seconds
spark-sql (default)> select * from lliangyu_table_mor;
20250121103054462       20250121103054462_0_0   
event_id:100,event_date:2015-01-01      event_type=type1        
87902562-6f99-4f09-bab8-97f4fa6becc7-0_0-17-18_20250121103054462.parquet        
100     2015-01-01      event_name_900  2015-01-01T13:51:39.340396Z     type1
Time taken: 0.236 seconds, Fetched 1 row(s)
spark-sql (default)>  {code}

> Hudi 1.0 Spark SQL unexpected delete behaviors with backward writer
> -------------------------------------------------------------------
>
>                 Key: HUDI-8821
>                 URL: https://issues.apache.org/jira/browse/HUDI-8821
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Leon Lin
>            Assignee: Davis Zhang
>            Priority: Blocker
>             Fix For: 1.0.1
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Reproduction Steps:
>  
> {code:java}
> // 1. Create an empty table with Hudi 0.14 + Spark 3.5.0 
> spark.sql(
> """
> |CREATE TABLE lliangyu_table_mor (
> | event_id INT,
> | event_date STRING,
> | event_name STRING,
> | event_ts STRING,
> | event_type STRING
> |) USING hudi
> | OPTIONS(
> | type = 'mor',
> | primaryKey = 'event_id,event_date',
> | preCombileField = 'event_ts',
> | hoodie.write.table.version = 6,
> | hoodie.compact.inline = 'true',
> | hoodie.compact.inline.max.delta.commits = 2
> |)
> |PARTITIONED BY (event_type)
> |LOCATION 
> 's3://lliangyu-580974493829-us-west-2/warehouse/hudi/lliangyu_table_mor';
> """.stripMargin){code}
> {code:java}
> // 2. Insert some rows using Spark 3.5.3 / Hudi 1.0 Backward writer
> spark.sql("set hoodie.write.table.version=6")
> spark.sql("set hoodie.compact.inline='true'")
> spark.sql("set hoodie.compact.inline.max.delta.commits=2")
> val insertStatements = Seq(
> "INSERT INTO lliangyu_table_mor VALUES (100, '2015-01-01', 'event_name_900', 
> '2015-01-01T13:51:39.340396Z', 'type1');",
> "INSERT INTO lliangyu_table_mor VALUES (101, '2015-01-01', 'event_name_546', 
> '2015-01-01T12:14:58.597216Z', 'type2');",
> "INSERT INTO lliangyu_table_mor VALUES (102, '2015-01-01', 'event_name_345', 
> '2015-01-01T13:51:40.417052Z', 'type3');",
> "INSERT INTO lliangyu_table_mor VALUES (103, '2015-01-01', 'event_name_234', 
> '2015-01-01T13:51:40.519832Z', 'type4');",
> "INSERT INTO lliangyu_table_mor VALUES (104, '2015-01-01', 'event_name_123', 
> '2015-01-01T12:15:00.512679Z', 'type1');",
> "INSERT INTO lliangyu_table_mor VALUES (105, '2015-01-01', 'event_name_678', 
> '2015-01-01T13:51:42.248818Z', 'type2');",
> "INSERT INTO lliangyu_table_mor VALUES (106, '2015-01-01', 'event_name_890', 
> '2015-01-01T13:51:44.735360Z', 'type3');",
> "INSERT INTO lliangyu_table_mor VALUES (107, '2015-01-01', 'event_name_944', 
> '2015-01-01T13:51:45.019544Z', 'type4');",
> "INSERT INTO lliangyu_table_mor VALUES (108, '2015-01-01', 'event_name_456', 
> '2015-01-01T13:51:45.208007Z', 'type1');",
> "INSERT INTO lliangyu_table_mor VALUES (109, '2015-01-01', 'event_name_567', 
> '2015-01-01T13:51:45.369689Z', 'type2');",
> "INSERT INTO lliangyu_table_mor VALUES (110, '2015-01-01', 'event_name_789', 
> '2015-01-01T12:15:05.664947Z', 'type3');"
> )
> insertStatements.foreach { query => spark.sql(query) }
> // DUE TO issue with https://issues.apache.org/jira/browse/HUDI-8820
> // You will find some rows inserted using backward writer does not appear in 
> selection.{code}
>  
> {code:java}
> // 3. Delete some rows using Spark 3.5.3 / Hudi 1.0 Backward writer
> // Run deletes on rows that could be retrieved from selection
> spark.sql("DELETE FROM default.lliangyu_table_mor WHERE event_type = 
> 'type2'").show(false);
> // Run select again returns incorrect results.
> +-------------------+---------------------+----------------------------------+----------------------+---------------------------------------------------------------------------+--------+----------+--------------+---------------------------+----------+
>  |_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key 
> |_hoodie_partition_path|_hoodie_file_name |event_id|event_date|event_name 
> |event_ts |event_type| 
> +-------------------+---------------------+----------------------------------+----------------------+---------------------------------------------------------------------------+--------+----------+--------------+---------------------------+----------+
>  |20250103202501108 
> |20250103202501108_0_0|event_id:108,event_date:2015-01-01|event_type=type1 
> |b935d179-56b3-4f81-81e4-8bb0cf97c873-0_0-131-4218_20250103202501108.parquet|108
>  |2015-01-01|event_name_456|2015-01-01T13:51:45.208007Z|type1 | 
> +-------------------+---------------------+----------------------------------+----------------------+---------------------------------------------------------------------------+--------+----------+--------------+---------------------------+----------+
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to