gfphoenix78 opened a new issue, #1425:
URL: https://github.com/apache/cloudberry/issues/1425

   ### Apache Cloudberry version
   
   main branch
   
   ### What happened
   
   ```sql
   create  external web  table t3(a int, b text)
   LOCATION ('http://<ip>:<port>/bad_gb.txt')
   FORMAT 'TEXT' (DELIMITER ','  NULL '' )  ENCODING 'GB18030'
   LOG ERRORS SEGMENT REJECT LIMIT 2;
   select * from t3;
   ```
   
   output:
   ```
   gpadmin=# select * from t3;
   ERROR:  segment reject limit reached, aborting operation  (seg0 slice1 
127.0.1.1:7002 pid=2316762)
   DETAIL:  Last error was: invalid byte sequence for encoding "GB18030": 0xa3 
0x0a
   CONTEXT:  External table t3, line 3 of file http://.../bad_gb.txt
   ```
   
   bad_gb.txt: encoding GB18030
   ```
   gpadmin@hashdata:/tmp/www$ hexdump -C bad_gb.txt
   00000000  31 2c ca c0 bd e7 0a 32  2c c4 e3 ba c3 c2 f0 a3  
|1,.....2,.......|
   00000010  0a 33 2c 6e 69 68 61 6f  0a                                      
|.3,nihao.|
   00000019
   ```
   
   ### What you think should happen instead
   
   Only the second line is bad, the first and third line should output 
according to its definition.
   
   ### How to reproduce
   
   repro, replace the 
   ```sql
   create  external web  table t3(a int, b text)
   LOCATION ('http://<ip>:<port>/bad_gb.txt')
   FORMAT 'TEXT' (DELIMITER ','  NULL '' )  ENCODING 'GB18030'
   LOG ERRORS SEGMENT REJECT LIMIT 2;
   select * from t3;
   
   
   -- or
   create temp table t0(a int, b text);
   -- copy the file bad_gb.txt to /tmp
   copy t0 from '/tmp/www/bad_gb.txt' with(encoding 'gb18030') log errors 
segment reject limit 2;
   ```
   
   output:
   ```
   gpadmin=# select * from t3;
   ERROR:  segment reject limit reached, aborting operation  (seg0 slice1 
127.0.1.1:7002 pid=2316762)
   DETAIL:  Last error was: invalid byte sequence for encoding "GB18030": 0xa3 
0x0a
   CONTEXT:  External table t3, line 3 of file http://.../bad_gb.txt
   
   -- or
   
   gpadmin=# copy t0 from '/tmp/www/bad_gb.txt' with(encoding 'gb18030') log 
errors segment reject limit 2;
   ERROR:  segment reject limit reached, aborting operation
   DETAIL:  Last error was: invalid byte sequence for encoding "GB18030": 0xa3 
0x0a, column a
   CONTEXT:  COPY t0, line 2, column a: "1,世界"
   
   ```
   
   bad_gb.txt: encoding GB18030
   ```
   gpadmin@hashdata:/tmp/www$ hexdump -C bad_gb.txt
   00000000  31 2c ca c0 bd e7 0a 32  2c c4 e3 ba c3 c2 f0 a3  
|1,.....2,.......|
   00000010  0a 33 2c 6e 69 68 61 6f  0a                       |.3,nihao.|
   00000019
   ```
   
   ### Operating System
   
   ubuntu 22.04
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes, I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/cloudberry/blob/main/CODE_OF_CONDUCT.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to