geserdugarov opened a new issue, #12133:
URL: https://github.com/apache/hudi/issues/12133
I've already created an issue HUDI-8394, but what to highlight this problem
here.
I suppose, this is a critical issue with current master in the case, when:
- bulk insert operation,
- `hoodie.datasource.write.row.writer.enable = false`,
- simple bucket index.
**Describe the problem you faced**
When I try to bulk insert to COW table, I see in file system parquet and log
files, which is MOR table behavior.
``` Bash
cat ./.hoodie/hoodie.properties
# ...
# hoodie.table.type=COPY_ON_WRITE <-- COW table
# ...
```
``` Bash
ll ./dt\=2021-01-05/
# total 456
# drwxr-xr-x 2 d00838679 d00838679 4096 окт 19 15:33 ./
# drwxrwxr-x 4 d00838679 d00838679 4096 окт 19 15:32 ../
# -rw-r--r-- 1 d00838679 d00838679 435346 окт 19 15:32
00000001-4a79-47b3-918c-05f8b90e8b14-0_1-14-12_20241019083242289.parquet
<-- base file
# -rw-r--r-- 1 d00838679 d00838679 3412 окт 19 15:32
.00000001-4a79-47b3-918c-05f8b90e8b14-0_1-14-12_20241019083242289.parquet.crc
# -rw-r--r-- 1 d00838679 d00838679 978 окт 19 15:33
.00000001-4a79-47b3-918c-05f8b90e8b14-0_20241019083307134.log.1_0-30-31
<-- log file as for MOR table
# -rw-r--r-- 1 d00838679 d00838679 16 окт 19 15:33
..00000001-4a79-47b3-918c-05f8b90e8b14-0_20241019083307134.log.1_0-30-31.crc
# -rw-r--r-- 1 d00838679 d00838679 96 окт 19 15:32
.hoodie_partition_metadata
# -rw-r--r-- 1 d00838679 d00838679 12 окт 19 15:32
..hoodie_partition_metadata.crc
```
**To Reproduce**
To reproduce, existed test `Test Bulk Insert Into Bucket Index Table` could
be used:
``` Java
test("Test Bulk Insert Into Bucket Index Table") {
withSQLConf("hoodie.datasource.write.operation" -> "bulk_insert",
"hoodie.bulkinsert.shuffle.parallelism" -> "1") {
withTempDir { tmp =>
val tableName = generateTableName
// Create a partitioned table
spark.sql(
s"""
|create table $tableName (
| id int,
| dt string,
| name string,
| price double,
| ts long
|) using hudi
| tblproperties (
| primaryKey = 'id,name',
| type = 'cow',
| preCombineField = 'ts',
| hoodie.index.type = 'BUCKET',
| hoodie.index.bucket.engine = 'SIMPLE',
| hoodie.bucket.index.num.buckets = '2',
| hoodie.bucket.index.hash.field = 'id,name',
| hoodie.datasource.write.row.writer.enable = 'false')
| partitioned by (dt)
| location '${tmp.getCanonicalPath}'
""".stripMargin)
spark.sql(
s"""
| insert into $tableName values
| (5, 'a1,1', 10, 1000, "2021-01-05")
""".stripMargin)
spark.sql(
s"""
| insert into $tableName values
| (9, 'a3,3', 30, 3000, "2021-01-05")
""".stripMargin)
)
}
}
}
```
**Expected behavior**
For COW table, only parquet files should be created.
**Environment Description**
* Hudi version : current master
* Spark version : 3.5
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]