lwz9103 opened a new issue, #6750:
URL: https://github.com/apache/incubator-gluten/issues/6750
### Backend
CH (ClickHouse)
### Bug description
### Reproduce sqls as follows:
```
create database if not exist local;
use local;
CREATE EXTERNAL TABLE customer (
c_custkey bigint not null,
c_name string not null,
c_address string not null,
c_nationkey bigint not null,
c_phone string not null,
c_acctbal double not null,
c_mktsegment string not null,
c_comment string not null)
USING PARQUET
LOCATION 'file:///data/tpch100/customer';
create database if not exist s3;
use s3;
CREATE EXTERNAL TABLE customer (
c_custkey bigint not null ,
c_name string not null ,
c_address string not null ,
c_nationkey bigint not null ,
c_phone string not null ,
c_acctbal double not null ,
c_mktsegment string not null ,
c_comment string not null )
USING clickhouse
CLUSTERED by (c_custkey) SORTED by (c_custkey) INTO 45 BUCKETS
TBLPROPERTIES (delta.checkpointInterval=5, storage_policy='__s3_main')
LOCATION
's3a://gluten-cicd/dataset/tpch100-mergetree-bucket-compact/customer';
insert into s3.customer select * from local.customer order by c_custkey;
optimize s3.customer
```
### Error msg

### Debug info:


### Root Cause
After insert data to s3.customer, spark executor node will keep part of file
mapping metadata, not all. If merge tasks contains part that file mapping not
exist on current node, error occurs.
### Spark version
None
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]