Attached is a diff containing the original (working) patch from my
(incomplete) patch, plus regression test changes and documentation changes.
While it's easy to regression-test the persistence of the fetch_size
options, I am confounded as to how we would show that the fetch_size
setting was respected. I've seen it with my own eyes viewing the query log
on redshift, but I see no way to automate that. Suggestions welcome.
On Fri, Feb 6, 2015 at 3:11 AM, Kyotaro HORIGUCHI <
[email protected]> wrote:
> Hello,
>
> > Redshift has a table, stv_inflight, which serves about the same purpose
> as
> > pg_stat_activity. Redshift seems to perform better with very high fetch
> > sizes (10,000 is a good start), so the default foreign data wrapper
> didn't
> > perform so well.
>
> I agree with you.
>
> > I was able to confirm that the first query showed "FETCH 150 FROM c1" as
> > the query, which is normal highly unhelpful, but in this case it confirms
> > that tables created in redshift_server do by default use the fetch_size
> > option given during server creation.
> >
> > I was also able to confirm that the second query showed "FETCH 151 FROM
> c1"
> > as the query, which shows that per-table overrides also work.
> >
> > I'd be happy to stop here, but Kyotaro might feel differently.
>
> This is enough in its own way, of course.
>
> > With this
> > limited patch, one could estimate the number of rows that would fit into
> > the desired memory block based on the row width and set fetch_size
> > accordingly.
>
> The users including me will be happy with it when the users know
> how to determin the fetch size. Especially the remote tables are
> very few or the configuration will be enough stable.
>
> On widely distributed systems, it would be far difficult to tune
> fetch size manually for every foreign tables, so finally it would
> be left at the default and safe size, it's 100 or so.
>
> This is the similar discussion about max_wal_size on another
> thread. Calculating fetch size is far tougher for users than
> setting expected memory usage, I think.
>
> > But we could go further, and have a fetch_max_memory option only at the
> > table level, and the fdw could do that same memory / estimated_row_size
> > calculation and derive fetch_size that way *at table creation time*, not
> > query time.
>
> We cannot know the real length of the text type data in advance,
> other than that, even defined as char(n), the n is the
> theoretically(or in-design) maximum size for the field but in the
> most cases the mean length of the real data would be far small
> than that. For that reason, calculating the ratio at the table
> creation time seems to be difficult.
>
> However, I agree to the Tom's suggestion that the changes in
> FETCH statement is defenitly ugly, especially the "overhead"
> argument is prohibitive even for me:(
>
> > Thanks to Kyotaro and Bruce Momjian for their help.
>
> Not at all.
>
> regardes,
>
>
> At Wed, 4 Feb 2015 18:06:02 -0500, Corey Huinker <[email protected]>
> wrote in <CADkLM=eTpKYX5VOfjLr0VvfXhEZbC2UeakN=
> [email protected]>
> > I applied this patch to REL9_4_STABLE, and I was able to connect to a
> > foreign database (redshift, actually).
> >
> > the basic outline of the test is below, names changed to protect my
> > employment.
> >
> > create extension if not exists postgres_fdw;
> >
> > create server redshift_server foreign data wrapper postgres_fdw
> > options ( host 'some.hostname.ext', port '5439', dbname 'redacted',
> > fetch_size '150' );
> >
> > create user mapping for public server redshift_server options ( user
> > 'redacted_user', password 'comeonyouarekiddingright' );
> >
> > create foreign table redshift_tab150 ( <colspecs> )
> > server redshift_server options (table_name 'redacted_table', schema_name
> > 'redacted_schema' );
> >
> > create foreign table redshift_tab151 ( <colspecs> )
> > server redshift_server options (table_name 'redacted_table', schema_name
> > 'redacted_schema', fetch_size '151' );
> >
> > -- i don't expect the fdw to push the aggregate, this is just a test to
> see
> > what query shows up in stv_inflight.
> > select count(*) from redshift_ccp150;
> >
> > -- i don't expect the fdw to push the aggregate, this is just a test to
> see
> > what query shows up in stv_inflight.
> > select count(*) from redshift_ccp151;
> >
> >
> > For those of you that aren't familiar with Redshift, it's a columnar
> > database that seems to be a fork of postgres 8.cough. You can connect to
> it
> > with modern libpq programs (psql, psycopg2, etc).
> > Redshift has a table, stv_inflight, which serves about the same purpose
> as
> > pg_stat_activity. Redshift seems to perform better with very high fetch
> > sizes (10,000 is a good start), so the default foreign data wrapper
> didn't
> > perform so well.
> >
> > I was able to confirm that the first query showed "FETCH 150 FROM c1" as
> > the query, which is normal highly unhelpful, but in this case it confirms
> > that tables created in redshift_server do by default use the fetch_size
> > option given during server creation.
> >
> > I was also able to confirm that the second query showed "FETCH 151 FROM
> c1"
> > as the query, which shows that per-table overrides also work.
> >
> > I'd be happy to stop here, but Kyotaro might feel differently. With this
> > limited patch, one could estimate the number of rows that would fit into
> > the desired memory block based on the row width and set fetch_size
> > accordingly.
> >
> > But we could go further, and have a fetch_max_memory option only at the
> > table level, and the fdw could do that same memory / estimated_row_size
> > calculation and derive fetch_size that way *at table creation time*, not
> > query time.
> >
> > Thanks to Kyotaro and Bruce Momjian for their help.
> >
> >
> >
> >
> >
> >
> > On Mon, Feb 2, 2015 at 2:27 AM, Kyotaro HORIGUCHI <
> > [email protected]> wrote:
> >
> > > Hmm, somehow I removed some recipients, especially the
> > > list. Sorry for the duplicate.
> > >
> > > -----
> > > Sorry, I've been back. Thank you for the comment.
> > >
> > > > Do you have any insight into where I would pass the custom row
> fetches
> > > from
> > > > the table struct to the scan struct?
> > >
> > > Yeah it's one simple way to tune it, if the user knows the
> > > appropreate value.
> > >
> > > > Last year I was working on a patch to postgres_fdw where the
> fetch_size
> > > > could be set at the table level and the server level.
> > > >
> > > > I was able to get the settings parsed and they would show up in
> > > > pg_foreign_table
> > > > and pg_foreign_servers. Unfortunately, I'm not very familiar with how
> > > > foreign data wrappers work, so I wasn't able to figure out how to get
> > > these
> > > > custom values passed from the PgFdwRelationInfo struct into the
> > > > query's PgFdwScanState
> > > > struct.
> > >
> > > Directly answering, the states needed to be shared among several
> > > stages are holded within fdw_private. Your new variable
> > > fpinfo->fetch_size can be read in postgresGetForeignPlan. It
> > > newly creates another fdw_private. You can pass your values to
> > > ForeignPlan making it hold the value there. Finally, you will get
> > > it in postgresBeginForeginScan and can set it into
> > > PgFdwScanState.
> > >
> > > > I bring this up only because it might be a simpler solution, in that
> the
> > > > table designer could set the fetch size very high for narrow tables,
> and
> > > > lower or default for wider tables. It's also a very clean syntax,
> just
> > > > another option on the table and/or server creation.
> > > >
> > > > My incomplete patch is attached.
> > >
> > > However, the fetch_size is not needed by planner (so far), so we
> > > can simply read the options in postgresBeginForeignScan() and set
> > > into PgFdwScanState. This runs once per exection.
> > >
> > > Finally, the attached patch will work as you intended.
> > >
> > > What do you think about this?
> > >
> > > regards,
> > >
> > > --
> > > Kyotaro Horiguchi
> > > NTT Open Source Software Center
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 583cce7..84334e6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -106,7 +106,8 @@ ALTER SERVER testserver1 OPTIONS (
sslcert 'value',
sslkey 'value',
sslrootcert 'value',
- sslcrl 'value'
+ sslcrl 'value',
+ fetch_size '101'
--requirepeer 'value',
-- krbsrvname 'value',
-- gsslib 'value',
@@ -114,18 +115,34 @@ ALTER SERVER testserver1 OPTIONS (
);
ALTER USER MAPPING FOR public SERVER testserver1
OPTIONS (DROP user, DROP password);
-ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1');
+ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1',
fetch_size '102');
ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
\det+
- List of foreign tables
- Schema | Table | Server | FDW Options |
Description
---------+-------+----------+---------------------------------------+-------------
- public | ft1 | loopback | (schema_name 'S 1', table_name 'T 1') |
- public | ft2 | loopback | (schema_name 'S 1', table_name 'T 1') |
+ List of foreign tables
+ Schema | Table | Server | FDW Options
| Description
+--------+-------+----------+---------------------------------------------------------+-------------
+ public | ft1 | loopback | (schema_name 'S 1', table_name 'T 1', fetch_size
'102') |
+ public | ft2 | loopback | (schema_name 'S 1', table_name 'T 1')
|
(2 rows)
+-- Test what options made it into pg_foreign_server.
+-- Filter for just the server we created.
+SELECT srvoptions FROM pg_foreign_server WHERE srvname = 'testserver1';
+
srvoptions
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
{use_remote_estimate=false,updatable=true,fdw_startup_cost=123.456,fdw_tuple_cost=0.123,service=value,connect_timeout=value,dbname=value,host=value,hostaddr=value,port=value,application_name=value,keepalives=value,keepalives_idle=value,keepalives_interval=value,sslcompression=value,sslmode=value,sslcert=value,sslkey=value,sslrootcert=value,sslcrl=value,fetch_size=101}
+(1 row)
+
+-- Test what options made it into pg_foreign_table.
+-- Filter this heavily because we cannot specify which foreign server.
+SELECT ftoptions FROM pg_foreign_table WHERE ftoptions @> array['table_name=T
1','fetch_size=102'];
+ ftoptions
+-----------------------------------------------------
+ {"schema_name=S 1","table_name=T 1",fetch_size=102}
+(1 row)
+
-- Now we should be able to run ANALYZE.
-- To exercise multiple code paths, we use local stats on ft1
-- and remote-estimate mode on ft2.
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 7547ec2..2a3ab7d 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -153,6 +153,12 @@ InitPgFdwOptions(void)
/* updatable is available on both server and table */
{"updatable", ForeignServerRelationId, false},
{"updatable", ForeignTableRelationId, false},
+ /*
+ * fetch_size is available on both server and table, the table
setting
+ * overrides the server setting.
+ */
+ {"fetch_size", ForeignServerRelationId, false},
+ {"fetch_size", ForeignTableRelationId, false},
{NULL, InvalidOid, false}
};
diff --git a/contrib/postgres_fdw/postgres_fdw.c
b/contrib/postgres_fdw/postgres_fdw.c
index 63f0577..733ffb6 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -134,6 +134,7 @@ typedef struct PgFdwScanState
/* extracted fdw_private data */
char *query; /* text of SELECT command */
List *retrieved_attrs; /* list of retrieved attribute numbers
*/
+ int fetch_size; /* number of tuples per
fetch */
/* for remote query execution */
PGconn *conn; /* connection for the scan */
@@ -871,6 +872,22 @@ postgresGetForeignPlan(PlannerInfo *root,
fdw_private);
}
+static DefElem*
+get_option(List *options, char *optname)
+{
+ ListCell *lc;
+
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, optname) == 0)
+ return def;
+ }
+ return NULL;
+}
+
+
/*
* postgresBeginForeignScan
* Initiate an executor scan of a foreign PostgreSQL table.
@@ -889,6 +906,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
int numParams;
int i;
ListCell *lc;
+ DefElem *def;
/*
* Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state stays NULL.
@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node, int
eflags)
server = GetForeignServer(table->serverid);
user = GetUserMapping(userid, server->serverid);
+ /* Reading table options */
+ fsstate->fetch_size = -1;
+
+ def = get_option(table->options, "fetch_size");
+ if (!def)
+ def = get_option(server->options, "fetch_size");
+
+ if (def)
+ {
+ fsstate->fetch_size = strtod(defGetString(def), NULL);
+ if (fsstate->fetch_size < 0)
+ elog(ERROR, "invalid fetch size for foreign table
\"%s\"",
+ get_rel_name(table->relid));
+ }
+ else
+ fsstate->fetch_size = 100;
+
/*
* Get connection to the foreign server. Connection manager will
* establish new connection if necessary.
@@ -2031,7 +2066,7 @@ fetch_more_data(ForeignScanState *node)
int i;
/* The fetch size is arbitrary, but shouldn't be enormous. */
- fetch_size = 100;
+ fetch_size = fsstate->fetch_size;
snprintf(sql, sizeof(sql), "FETCH %d FROM c%u",
fetch_size, fsstate->cursor_number);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql
b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83e8fa7..d12925a 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -115,7 +115,8 @@ ALTER SERVER testserver1 OPTIONS (
sslcert 'value',
sslkey 'value',
sslrootcert 'value',
- sslcrl 'value'
+ sslcrl 'value',
+ fetch_size '101'
--requirepeer 'value',
-- krbsrvname 'value',
-- gsslib 'value',
@@ -123,12 +124,20 @@ ALTER SERVER testserver1 OPTIONS (
);
ALTER USER MAPPING FOR public SERVER testserver1
OPTIONS (DROP user, DROP password);
-ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1');
+ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1',
fetch_size '102');
ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
\det+
+-- Test what options made it into pg_foreign_server.
+-- Filter for just the server we created.
+SELECT srvoptions FROM pg_foreign_server WHERE srvname = 'testserver1';
+
+-- Test what options made it into pg_foreign_table.
+-- Filter this heavily because we cannot specify which foreign server.
+SELECT ftoptions FROM pg_foreign_table WHERE ftoptions @> array['table_name=T
1','fetch_size=102'];
+
-- Now we should be able to run ANALYZE.
-- To exercise multiple code paths, we use local stats on ft1
-- and remote-estimate mode on ft2.
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 43adb61..02b004d 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -294,6 +294,34 @@
</sect3>
<sect3>
+ <title>Fetch Size Options</title>
+
+ <para>
+ By default, rows are fetched from the remote server 100 at a time.
+ This may be overridden using the following option:
+ </para>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>fetch_size</literal></term>
+ <listitem>
+ <para>
+ This option specifies the number of rows <filename>postgres_fdw</>
+ should get in each fetch operation. It can be specified for a foreign
+ table or a foreign server. The option specified on a table overrides
+ an option specified for the server.
+ The default is <literal>100</>.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </sect3>
+
+
+ <sect3>
<title>Importing Options</title>
<para>
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers