[
https://issues.apache.org/jira/browse/HUDI-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wangwenli updated HUDI-5155:
----------------------------
Description:
hive read mor rt table, will get duplicated record in below case:
# using bucket index type
# say primary key 1 - 100, set bucket number to 1
# insert 1 - 100 record ,compact it , one parquet file will be generated
# insert 1 - 100 record once again, but dont't compact it, so the data file
will contain 1 parquet file + 1 log file.
# select * from table where key=1, you will get 2 record.
the cause is :
in HoodieMergeOnReadtableInputFormat ,the isSplitable will return true, it
will generate two map task, each task include the log file, so each task will
return one record.
pls refer this:
https://github.com/apache/hudi/issues/4618
was:
hive read mor rt table, will get duplicated record in below case:
# using bucket index type
# say primary key 1 - 100, set bucket number to 1
# insert 1 - 100 record ,compact it , one parquet file will be generated
# insert 1 - 100 record once again, but dont't compact it, so the data file
will contain 1 parquet file + 1 log file.
# select * from table where key=1, you will get 2 record.
the cause is :
in HoodieMergeOnReadtableInputFormat ,the isSplitable will return true, it
will generate two map task, each task include the log file, so each task will
return one record.
> hive reading rt table will get duplicate record
> -----------------------------------------------
>
> Key: HUDI-5155
> URL: https://issues.apache.org/jira/browse/HUDI-5155
> Project: Apache Hudi
> Issue Type: Bug
> Components: hive
> Affects Versions: 0.11.0
> Reporter: wangwenli
> Priority: Major
>
> hive read mor rt table, will get duplicated record in below case:
> # using bucket index type
> # say primary key 1 - 100, set bucket number to 1
> # insert 1 - 100 record ,compact it , one parquet file will be generated
> # insert 1 - 100 record once again, but dont't compact it, so the data file
> will contain 1 parquet file + 1 log file.
> # select * from table where key=1, you will get 2 record.
> the cause is :
> in HoodieMergeOnReadtableInputFormat ,the isSplitable will return true, it
> will generate two map task, each task include the log file, so each task
> will return one record.
> pls refer this:
> https://github.com/apache/hudi/issues/4618
--
This message was sent by Atlassian Jira
(v8.20.10#820010)