gfphoenix78 opened a new issue, #1425:
URL: https://github.com/apache/cloudberry/issues/1425
### Apache Cloudberry version
main branch
### What happened
```sql
create external web table t3(a int, b text)
LOCATION ('http://<ip>:<port>/bad_gb.txt')
FORMAT 'TEXT' (DELIMITER ',' NULL '' ) ENCODING 'GB18030'
LOG ERRORS SEGMENT REJECT LIMIT 2;
select * from t3;
```
output:
```
gpadmin=# select * from t3;
ERROR: segment reject limit reached, aborting operation (seg0 slice1
127.0.1.1:7002 pid=2316762)
DETAIL: Last error was: invalid byte sequence for encoding "GB18030": 0xa3
0x0a
CONTEXT: External table t3, line 3 of file http://.../bad_gb.txt
```
bad_gb.txt: encoding GB18030
```
gpadmin@hashdata:/tmp/www$ hexdump -C bad_gb.txt
00000000 31 2c ca c0 bd e7 0a 32 2c c4 e3 ba c3 c2 f0 a3
|1,.....2,.......|
00000010 0a 33 2c 6e 69 68 61 6f 0a
|.3,nihao.|
00000019
```
### What you think should happen instead
Only the second line is bad, the first and third line should output
according to its definition.
### How to reproduce
repro, replace the
```sql
create external web table t3(a int, b text)
LOCATION ('http://<ip>:<port>/bad_gb.txt')
FORMAT 'TEXT' (DELIMITER ',' NULL '' ) ENCODING 'GB18030'
LOG ERRORS SEGMENT REJECT LIMIT 2;
select * from t3;
-- or
create temp table t0(a int, b text);
-- copy the file bad_gb.txt to /tmp
copy t0 from '/tmp/www/bad_gb.txt' with(encoding 'gb18030') log errors
segment reject limit 2;
```
output:
```
gpadmin=# select * from t3;
ERROR: segment reject limit reached, aborting operation (seg0 slice1
127.0.1.1:7002 pid=2316762)
DETAIL: Last error was: invalid byte sequence for encoding "GB18030": 0xa3
0x0a
CONTEXT: External table t3, line 3 of file http://.../bad_gb.txt
-- or
gpadmin=# copy t0 from '/tmp/www/bad_gb.txt' with(encoding 'gb18030') log
errors segment reject limit 2;
ERROR: segment reject limit reached, aborting operation
DETAIL: Last error was: invalid byte sequence for encoding "GB18030": 0xa3
0x0a, column a
CONTEXT: COPY t0, line 2, column a: "1,世界"
```
bad_gb.txt: encoding GB18030
```
gpadmin@hashdata:/tmp/www$ hexdump -C bad_gb.txt
00000000 31 2c ca c0 bd e7 0a 32 2c c4 e3 ba c3 c2 f0 a3
|1,.....2,.......|
00000010 0a 33 2c 6e 69 68 61 6f 0a |.3,nihao.|
00000019
```
### Operating System
ubuntu 22.04
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes, I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/cloudberry/blob/main/CODE_OF_CONDUCT.md).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]