[
https://issues.apache.org/jira/browse/HAWQ-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071381#comment-15071381
]
Ruilong Huo commented on HAWQ-280:
----------------------------------
Root cause analysis shows that: the FIRST_1000_BAD rule dominates
REJECT_LIMIT_REACHED rule while checking rejected rows during external table
access. The resolution is to remove FIRST_1000_BAD rule.
> Error accessing external table or copying from file with bad rows
> -----------------------------------------------------------------
>
> Key: HAWQ-280
> URL: https://issues.apache.org/jira/browse/HAWQ-280
> Project: Apache HAWQ
> Issue Type: Bug
> Components: External Tables
> Affects Versions: 2.0.0-beta-incubating
> Reporter: Ruilong Huo
> Assignee: Ruilong Huo
> Attachments: test.csv
>
>
> It errors out without return result when accessing external table or copying
> from file with bad rows.
> 1. Error accessing external table with bad rows
> {noformat}
> Step 1: download attached test.csv with 2000 row which are all bad formated
> Step 2: start gpfdist service
> gpfdist -d /home/gpadmin/data/ -p 8081 -l /home/gpadmin/log/load.log &
> ------------------------------------------------------------------------------------------------
> [1] 34635
> Serving HTTP on port 8081, directory /home/gpadmin/data
> Step 3: create external table
> CREATE EXTERNAL TABLE test_ext (id INT, a TEXT, b TEXT, c TEXT, z TEXT)
> LOCATION ('gpfdist://localhost:8081/test.csv')
> FORMAT 'CSV'
> LOG ERRORS INTO test_ext_err SEGMENT REJECT LIMIT 3000 ROWS;
> -----------------------------------------------------------------------------------------------------
> NOTICE: Error table "test_ext_err" does not exist. Auto generating an error
> table with the same name
> CREATE EXTERNAL TABLE
> Step 4: access external table
> SELECT COUNT(*) FROM test_ext;
> -------------------------------------------------
> ERROR: All 1000 first rows in this segment were rejected. Aborting operation
> regardless of REJECT LIMIT value. Last error was: missing data for column "z"
> (seg0 localhost:40000 pid=35647)
> DETAIL: External table test_ext, line 1000 of
> gpfdist://localhost:8081/test.csv: "29,aaa,bbb,zzz"
> {noformat}
> 2. Error copying from file with bad rows
> {noformat}
> Step 1: download attached test.csv with 2000 row which are all bad formated
> Step 2: create table
> CREATE TABLE test_copy (id INT, a TEXT, b TEXT, c TEXT, z TEXT);
> ------------------------------------------------------------------------------------------------
> CREATE TABLE
> Step 3: copy data in file to table in database
> COPY test_copy FROM '/home/gpadmin/data/test.csv' LOG ERRORS INTO
> test_copy_err SEGMENT REJECT LIMIT 3000 ROWS;
> --------------------------------------------------------------------------------------------------------
> NOTICE: Error table "test_copy_err" does not exist. Auto generating an error
> table with the same name
> WARNING: The error table was created in the same transaction as this
> operation. It will get dropped if transaction rolls back even if bad rows are
> present
> HINT: To avoid this create the error table ahead of time using: CREATE TABLE
> <name> (cmdtime timestamp with time zone, relname text, filename text,
> linenum integer, bytenum integer, errmsg text, rawdata text, rawbytes bytea)
> ERROR: All 1000 first rows in this segment were rejected. Aborting operation
> regardless of REJECT LIMIT value. Last error was: missing data for column "a"
> CONTEXT: COPY test_copy, line 1000: "29,aaa,bbb,zzz"
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)