Re: Small fix on COPY ON_ERROR document

Yugo NAGATA Thu, 01 Feb 2024 22:59:38 -0800

On Fri, 02 Feb 2024 11:29:41 +0900
torikoshia <[email protected]> wrote:


> On 2024-02-01 15:16, Yugo NAGATA wrote:
> > On Mon, 29 Jan 2024 15:47:25 +0900
> > Yugo NAGATA <[email protected]> wrote:
> > 
> >> On Sun, 28 Jan 2024 19:14:58 -0700
> >> "David G. Johnston" <[email protected]> wrote:
> >> 
> >> > > Also, I think "invalid input syntax" is a bit ambiguous. For example,
> >> > > COPY FROM raises an error when the number of input column does not 
> >> > > match
> >> > > to the table schema, but this error is not ignored by ON_ERROR while
> >> > > this seems to fall into the category of "invalid input syntax".
> >> >
> >> >
> >> >
> >> > It is literally the error text that appears if one were not to ignore it.
> >> > It isn’t a category of errors.  But I’m open to ideas here.  But being
> >> > explicit with what on actually sees in the system seemed preferable to
> >> > inventing new classification terms not otherwise used.
> >> 
> >> Thank you for explanation! I understood the words was from the error 
> >> messages
> >> that users actually see. However, as Torikoshi-san said in [1], errors 
> >> other
> >> than valid input syntax (e.g. range error) can be also ignored, 
> >> therefore it
> >> would be better to describe to be ignored errors more specifically.
> >> 
> >> [1] 
> >> https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com
> >> 
> >> >
> >> > >
> >> > > So, keeping consistency with the existing description, we can say:
> >> > >
> >> > > "Specifies which how to behave when encountering an error due to
> >> > >  column values unacceptable to the input function of each attribute's
> >> > >  data type."
> >> >
> >> >
> >> > Yeah, I was considering something along those lines as an option as well.
> >> > But I’d rather add that wording to the glossary.
> >> 
> >> Although I am still be not convinced if we have to introduce the words
> >> "soft error" to the documentation, I don't care it if there are no 
> >> other
> >> opposite opinions.
> > 
> > Attached is a updated patch v3, which is a version that uses the above
> > wording instead of "soft error".
> > 
> >> >
> >> > > Currently, ON_ERROR doesn't support other soft errors, so it can 
> >> > > explain
> >> > > it more simply without introducing the new concept, "soft error" to 
> >> > > users.
> >> > >
> >> > >
> >> > Good point.  Seems we should define what user-facing errors are ignored
> >> > anywhere in the system and if we aren’t consistently leveraging these in
> >> > all areas/commands make the necessary qualifications in those specific
> >> > places.
> >> >
> >> 
> >> > > I think "left in a deleted state" is also unclear for users because 
> >> > > this
> >> > > explains the internal state but not how looks from user's view.How 
> >> > > about
> >> > > leaving the explanation "These rows will not be visible or accessible" 
> >> > > in
> >> > > the existing statement?
> >> > >
> >> >
> >> > Just visible then, I don’t like an “or” there and as tuples at least they
> >> > are accessible to the system, in vacuum especially.  But I expected the
> >> > user to understand “as if you deleted it” as their operational concept 
> >> > more
> >> > readily than visible.  I think this will be read by people who haven’t 
> >> > read
> >> > MVCC to fully understand what visible means but know enough to run vacuum
> >> > to clean up updated and deleted data as a rule.
> >> 
> >> Ok, I agree we can omit "or accessible". How do you like the 
> >> followings?
> >> Still redundant?
> >> 
> >>  "If the command fails, these rows are left in a deleted state;
> >>   these rows will not be visible, but they still occupy disk space. "
> > 
> > Also, the above statement is used in the patch.
> 
> Thanks for updating the patch!
> 
> I like your description which doesn't use the word soft error.

Thank you for your comments!

> 
> Here are minor comments:
> 
> > +      <literal>ignore</literal> means discard the input row and 
> > continue with the next one.
> > +      The default is <literal>stop</literal>
> 
> Is "." required at the end of the line?
>
> >      An <literal>NOTICE</literal> level context message containing the 
> > ignored row count is
> 
> Should 'An' be 'A'?
> 
> Also, I wasn't sure the necessity of 'context'.
> It might be possible to just say "A NOTICE message containing the 
> ignored row count.."
> considering below existing descriptions:
> 
>    doc/src/sgml/pltcl.sgml:     a <literal>NOTICE</literal> message each 
> time a supported command is
>    doc/src/sgml/pltcl.sgml-     executed:
> 
>    doc/src/sgml/plpgsql.sgml:     This example trigger simply raises a 
> <literal>NOTICE</literal> message
>    doc/src/sgml/plpgsql.sgml-     each time a supported command is 
> executed.

I attached a updated patch including fixes you pointed out above.

Regards,
Yugo Nagata

> -- 
> Regards,
> 
> --
> Atsushi Torikoshi
> NTT DATA Group Corporation


-- 
Yugo NAGATA <[email protected]>

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..3c2feaa11a 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -90,6 +90,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     in the <structname>pg_stat_progress_copy</structname> view. See
     <xref linkend="copy-progress-reporting"/> for details.
   </para>
+
+  <para>
+    By default, <command>COPY</command> will fail if it encounters an error
+    during processing. For use cases where a best-effort attempt at loading
+    the entire file is desired, the <literal>ON_ERROR</literal> clause can
+    be used to specify some other behavior.
+  </para>
  </refsect1>
 
  <refsect1>
@@ -378,17 +385,20 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     <term><literal>ON_ERROR</literal></term>
     <listitem>
      <para>
-      Specifies which <replaceable class="parameter">
-      error_action</replaceable> to perform when there is malformed data in the input.
-      Currently, only <literal>stop</literal> (default) and <literal>ignore</literal>
-      values are supported.
-      If the <literal>stop</literal> value is specified,
-      <command>COPY</command> stops operation at the first error.
-      If the <literal>ignore</literal> value is specified,
-      <command>COPY</command> skips malformed data and continues copying data.
-      The option is allowed only in <command>COPY FROM</command>.
-      Only <literal>stop</literal> value is allowed when
-      using <literal>binary</literal> format.
+      Specifies which how to behave when encountering an error due to column values
+      unacceptable to the input function of each attribute's data type.
+      An <replaceable class="parameter">error_action</replaceable> value of
+      <literal>stop</literal> means fail the command, while
+      <literal>ignore</literal> means discard the input row and continue with the next one.
+      The default is <literal>stop</literal>.
+     </para>
+     <para>
+      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
+     </para>
+     <para>
+      A <literal>NOTICE</literal> message containing the ignored row count is emitted at the end
+      of the <command>COPY FROM</command> if at least one row was discarded.
      </para>
     </listitem>
    </varlistentry>
@@ -576,15 +586,13 @@ COPY <replaceable class="parameter">count</replaceable>
    </para>
 
    <para>
-    <command>COPY</command> stops operation at the first error when
-    <literal>ON_ERROR</literal> is not specified. This
-    should not lead to problems in the event of a <command>COPY
-    TO</command>, but the target table will already have received
-    earlier rows in a <command>COPY FROM</command>. These rows will not
-    be visible or accessible, but they still occupy disk space. This might
-    amount to a considerable amount of wasted disk space if the failure
-    happened well into a large copy operation. You might wish to invoke
-    <command>VACUUM</command> to recover the wasted space.
+    The <command>COPY FROM</command> command physically inserts input rows
+    into the table as it progresses.  If the command fails, these rows are
+    left in a deleted state; these rows will not be visible, but still
+    occupying disk space. This might amount to a considerable amount of
+    wasted disk space if the failure happened well into a large copy
+    operation. <command>VACUUM</command> should be used to recover the
+    wasted space.
    </para>
 
    <para>

Re: Small fix on COPY ON_ERROR document

Reply via email to