shengchiqu opened a new issue, #7229:
URL: https://github.com/apache/hudi/issues/7229
i use flink to build hudi demo in version 0.12.1, and i wang to test
GLOBAL_BLOOM
because I looked at the source code and found it already supported
flinksql
CREATE TABLE IF NOT EXISTS customer_hudi(
`C_CUSTKEY` int,
`C_NAME` varchar(25),
`C_ADDRESS` varchar(40),
`C_NATIONKEY` int,
`C_PHONE` char(15),
`C_ACCTBAL` decimal(15, 2),
`C_MKTSEGMENT` char(10),
`C_COMMENT` varchar(117),
`ts` timestamp(3),
PRIMARY KEY (C_CUSTKEY) NOT ENFORCED
) PARTITIONED BY (C_NATIONKEY)
WITH (
'connector' = 'hudi',
'path' = 'hdfs://ip:port/hudi/customer',
'table.type' = 'MERGE_ON_READ',
'changelog.enabled' = 'true',
'read.streaming.enabled' = 'true',
'read.start-commit' = '20221022134557',
'read.streaming.check-interval' = '20',
'compaction.schedule.enabled' = 'true',
'compaction.async.enabled' = 'false',
'compaction.trigger.strategy' = 'num_or_time',
'compaction.delta_commits' = '5',
'compaction.delta_seconds' = '120',
'hoodie.index.type' = 'GLOBAL_BLOOM',
'hoodie.bloom.index.update.partition.path' = 'true'
)
But when I update the partition value of the same primary key, I find two
pieces of data in the hudi (the partition column is C_NATIONKEY, i update the
row column value from 15 to 13)
i use flink inBatchMode to read hudi
EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build();
CREATE TABLE IF NOT EXISTS customer_hudi(
`C_CUSTKEY` int,
`C_NAME` varchar(25),
`C_ADDRESS` varchar(40),
`C_NATIONKEY` int,
`C_PHONE` char(15),
`C_ACCTBAL` decimal(15, 2),
`C_MKTSEGMENT` char(10),
`C_COMMENT` varchar(117),
`ts` timestamp(3),
PRIMARY KEY (C_CUSTKEY) NOT ENFORCED
) PARTITIONED BY (C_NATIONKEY)
WITH (
'connector' = 'hudi',
'path' = 'hdfs://ip:port/hudi/customer',
'table.type' = 'MERGE_ON_READ',
'changelog.enabled' = 'false',
'read.streaming.enabled' = 'false',
'read.start-commit' = '20221022134557',
'read.streaming.check-interval' = '20' ,
'hoodie.index.type' = 'GLOBAL_BLOOM',
'hoodie.bloom.index.update.partition.path' = 'true'
)
+-------------+--------------------------------+--------------------------------+-------------+--------------------------------+-------------------+--------------------------------+--------------------------------+-------------------------+
| C_CUSTKEY | C_NAME |
C_ADDRESS | C_NATIONKEY | C_PHONE | C_ACCTBAL |
C_MKTSEGMENT | C_COMMENT |
ts |
+-------------+--------------------------------+--------------------------------+-------------+--------------------------------+-------------------+--------------------------------+--------------------------------+-------------------------+
| 1 | Customer#000000001 |
1111 | 15 | 25-989-741-2988 | 711.56 |
BUILDING | to the even, regular platel... | 2022-11-17
17:08:07.501 |
| 1 | Customer#000000001 |
1111 | 13 | 25-989-741-2988 | 711.56 |
BUILDING | to the even, regular platel... | 2022-11-17
17:08:07.502 |
+-------------+--------------------------------+--------------------------------+-------------+--------------------------------+-------------------+--------------------------------+--------------------------------+-------------------------+
Whether I used the correct table creation parameters?
Why do I configure `GLOBAL_BOOLM` but it does not take effect? Old data
still exists in the old partition
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]