Hi
2016-01-22 7:03 GMT+01:00 Vitaly Burovoy <[email protected]>:
> On 1/20/16, Pavel Stehule <[email protected]> wrote:
> > ...
> > New version is attached
> >
> > Regards
> > Pavel
>
> Hello!
>
> 1. Why the function is marked as VOLATILE? It depends only on input
> value, it does change nothing in the DB.
> I guess it should be IMMUTABLE for efficient caching its result.
>
sure, it should be immutable,
fixed
copy/paste bug
?? why pg_size_pretty is volatile??
>
> 2.
> > text *arg = PG_GETARG_TEXT_PP(0);
> ...
> > str = text_to_cstring(arg);
> >
> Hmm... My question was _why_ you get TEXT and convert it instead of
> getting an argument of type cstring (not about usage of
> VARSIZE_ANY_EXHDR).
> It was enough to give me a link[1] and leave the usage of
> VARSIZE_ANY_EXHDR as is.
>
>
I understand to this question now. This is PostgreSQL pattern - cstring is
internal type only - used for in/out functions only
you can try
select * from pg_proc where 'cstring'::regtype::oid =
any(proargtypes::oid[]);
With cstring you are bypass PostgreSQL type system (cast from cstring and
to cstring is allowed for anytype) - so it should not be used in usual
functions.
>
>
> I think all the other claims below are mostly cosmetic. Main behavior
> is OK (considering its usage will not be in heavy queries).
>
> ===
> Documentation:
> Besides currently added row in a table I think it is worth to add
> detailed comment after a block "<function>pg_size_pretty</> can be
> used" similar to (but the best way is to get advice for a correct
> phrase):
> "pg_size_bytes can be used to get a size in bytes as bigint from a
> human-readable format for a comparison purposes (it is the opposite to
> pg_size_pretty function). The format is numeric with an optional size
> unit: kB, MB, GB or TB."
> (and delete unit sizes from the table row).
>
fixed partially
>
> ===
> Tests:
> Since there was a letter with an explanation why units bigger than
> "TB" can't be used[2], it is a good reason to add that size units
> ("PB", "EB", "ZB") in tests with a note to update the documentation
> part when that unit are supported.
>
fixed
>
> ===
> Code style:
> 1. When you declare pointers, you must align _names_ by the left
> border, i.e. asterisks must be at one position to the left from the
> aligned names, i.e. one to three spaces instead of TAB before them.
>
fixed
>
> 2.
> > int multiplier;
> One more TAB to align it with the name at the next line.
>
fixed
>
> ===
> Code:
>
> > /* allow whitespace between integer and unit */
> May be "numeric" instead of "integer"?
>
used "number"
>
> > "\"%s\" is not in expected format"
> It is not clear what format is expected.
>
I changed to string "invalid size: \"%s\"", but the final form of these
messages should be written by native speaker.
>
> > /* ignore plus symbol */
> It seems it is not ignored, but copied... ;-)
>
removed
>
> > to ger hintstr
> s/ger/get/
> > Elsewhere is used space trimmed buffer.
>
> I'd write it as "Otherwise trimmed value is used."
> I'm not good at English, but that full block looks little odd for me.
> I tried to reword it in the attachment single_buffer.c, but I don't
> think it is enough.
>
> > We allow ending spaces.
> "trailing"?
>
>
fixed
> > but this message can be little bit no intuitive - better text is "is not
> a valid number"
> It was a block with a comment "don't allow empty string", i.e. absence
> of a number...
>
> > Automatic memory deallocation doesn't cover all possible situations where
> > the function can be used - for example DirectFunctionCall - so explicit
> > deallocation can descrease a memory requirements when you call these
> > functions from C.
> DirectFunctionCall uses a memory context of the caller. For example,
> you don't free "num" allocated by numeric_in and numeric_mul, also
> mul_num allocated by int8_numeric...
> I'd ask experienced hackers for importance of freeing (or leaving)
> "str" and "buffer".
>
> > postgres=# select pg_size_bytes('+912+ kB');
> > ERROR: invalid unit: "+ kB"
> > This is difficult - you have to divide string to two parts and first
> world is number, second world is unit.
> Your code looks like a duplicated (not by lines, but by behavior).
> numeric_in has similar checks, let it do them for you. The goal of
> your function is just split mantissa and size unit, and if the last
> one is found, turn it into an exponent.
> See my example in the attachment check_mantissa_by_numeric_in.c. There
> is just searching non-space char that can't be part of numeric. Then
> all before it passes to numeric_in and let it check anything it
> should. I even left first spaces to show full numeric part to a user
> if an error occurs (for any case).
> The second attachment single_buffer.c is an attempt to avoid
> allocating the second buffer (the first is "str" allocated by
> text_to_cstring) and copying into it. I don't think it gives a big
> improvement, but nevertheless.
>
I don't think, so it is improvement - now the code is relative easy, and I
need a original string, because it is part of error message.
sending updated patch
Regards
Pavel
>
> ===
> [1] http://www.postgresql.org/message-id/[email protected]
> [2]
> http://www.postgresql.org/message-id/CAB7nPqS6Wob4WnZb=dhb3o0pc-nx1v3xjszkn3z9kbexgcq...@mail.gmail.com
>
> --
> Best regards,
> Vitaly Burovoy
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 9c143b2..6ccf249
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*************** postgres=# SELECT * FROM pg_xlogfile_nam
*** 17724,17729 ****
--- 17724,17732 ----
<primary>pg_relation_size</primary>
</indexterm>
<indexterm>
+ <primary>pg_size_bytes</primary>
+ </indexterm>
+ <indexterm>
<primary>pg_size_pretty</primary>
</indexterm>
<indexterm>
*************** postgres=# SELECT * FROM pg_xlogfile_nam
*** 17795,17800 ****
--- 17798,17814 ----
</row>
<row>
<entry>
+ <literal><function>pg_size_bytes(<type>text</type>)</function></literal>
+ </entry>
+ <entry><type>bigint</type></entry>
+ <entry>
+ Converts a size in human-readable format with size units into bytes.
+ The parameter is case-insensitive, and the supported size units
+ are: kB, MB, GB and TB.
+ </entry>
+ </row>
+ <row>
+ <entry>
<literal><function>pg_size_pretty(<type>bigint</type>)</function></literal>
</entry>
<entry><type>text</type></entry>
*************** postgres=# SELECT * FROM pg_xlogfile_nam
*** 17930,17935 ****
--- 17944,17956 ----
</para>
<para>
+ <function>pg_size_bytes</> can be used to get a size in bytes as
+ bigint from a human-readable format for a comparison purposes (it is
+ the opposite to <function>pg_size_pretty</> function). The format is
+ numeric with an optional size unit: kB, MB, GB or TB.
+ </para>
+
+ <para>
The functions above that operate on tables or indexes accept a
<type>regclass</> argument, which is simply the OID of the table or index
in the <structname>pg_class</> system catalog. You do not have to look up
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
new file mode 100644
index 2084692..e375a8a
*** a/src/backend/utils/adt/dbsize.c
--- b/src/backend/utils/adt/dbsize.c
***************
*** 25,30 ****
--- 25,31 ----
#include "storage/fd.h"
#include "utils/acl.h"
#include "utils/builtins.h"
+ #include "utils/guc.h"
#include "utils/numeric.h"
#include "utils/rel.h"
#include "utils/relfilenodemap.h"
*************** pg_size_pretty_numeric(PG_FUNCTION_ARGS)
*** 700,705 ****
--- 701,846 ----
}
/*
+ * Convert human readable size to long int.
+ *
+ * Due to support for decimal values and case insensitivity of units
+ * a function parse_int cannot be used.
+ */
+ Datum
+ pg_size_bytes(PG_FUNCTION_ARGS)
+ {
+ text *arg = PG_GETARG_TEXT_PP(0);
+ char *str,
+ *strptr;
+ char *buffer,
+ *bufptr;
+ Numeric num;
+ int64 result;
+ bool found_digit = false;
+ bool found_point = false;
+
+ str = text_to_cstring(arg);
+ strptr = str;
+
+ /* prepare buffer for parts of parsed input string */
+ buffer = (char *) palloc(strlen(str) + 1);
+ bufptr = buffer;
+
+ /* Skip leading spaces */
+ while (isspace((unsigned char) *strptr))
+ strptr++;
+
+ switch (*strptr)
+ {
+ case '+':
+ case '-':
+ *bufptr++ = *strptr++;
+ break;
+ }
+
+ while(*strptr != '\0')
+ {
+ if (isdigit((unsigned char) *strptr))
+ {
+ *bufptr++ = *strptr++;
+ found_digit = true;
+ }
+ else if (*strptr == '.')
+ {
+ if (found_point)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("inavalid size: \"%s\"", str)));
+
+ *bufptr++ = *strptr++;
+ found_point = true;
+ }
+ else
+ /* any non digit char, the unit is starting */
+ break;
+ }
+
+ if (!found_digit)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid size: \"%s\"", str)));
+
+ *bufptr = '\0';
+
+ /* don't allow empty string */
+ if (*buffer == '\0')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid size: \"%s\"", str)));
+
+ num = DatumGetNumeric(DirectFunctionCall3(numeric_in,
+ CStringGetDatum(buffer), 0, -1));
+
+ /* allow whitespace between number and unit */
+ while (isspace((unsigned char) *strptr))
+ strptr++;
+
+ /* Handle possible unit */
+ if (*strptr != '\0')
+ {
+ int multiplier;
+ Numeric mul_num;
+ const char *hintmsg;
+ const char *unitstr;
+
+ bufptr = buffer;
+
+ /*
+ * unitstr holds complete string related to unit part. Used
+ * as part of error message. The parse_memory_unit is called
+ * with this value, when we detected invalid format, and we
+ * would to emit error result to get hintstr. Otherwise
+ * trimmed value is used.
+ */
+ unitstr = strptr;
+
+ /* copy chars to buffer and stop on first space or end string */
+ while (*strptr != '\0' && !isspace((unsigned char) *strptr))
+ *bufptr++ = *strptr++;
+
+ *bufptr = '\0';
+
+ /* We allow trailing spaces. Skip all spaces. */
+ while (isspace((unsigned char) *strptr))
+ strptr++;
+
+ /* Use buffer as unitstr, when we didn't find more words. */
+ if (*strptr == '\0')
+ unitstr = buffer;
+
+ /* Still, the unit can be invalid: too long, unknown */
+ if (!parse_memory_unit(unitstr, &multiplier, &hintmsg))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid unit: \"%s\"", unitstr),
+ errhint("%s", _(hintmsg))));
+
+ /*
+ * Now, the multiplier is in KB units. It should be multiplied by 1024
+ * before usage.
+ */
+ mul_num = DatumGetNumeric(DirectFunctionCall1(int8_numeric,
+ Int64GetDatum(multiplier * 1024L)));
+
+ num = DatumGetNumeric(DirectFunctionCall2(numeric_mul,
+ NumericGetDatum(mul_num),
+ NumericGetDatum(num)));
+ }
+
+ result = DatumGetInt64(DirectFunctionCall1(numeric_int8, NumericGetDatum(num)));
+
+ pfree(buffer);
+ pfree(str);
+
+ PG_RETURN_INT64(result);
+ }
+
+ /*
* Get the filenode of a relation
*
* This is expected to be used in queries like
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
new file mode 100644
index 38ba82f..4020df7
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
*************** convert_from_base_unit(int64 base_value,
*** 5238,5243 ****
--- 5238,5272 ----
/*
+ * Parse value as some known memory unit to their size in bytes.
+ * Used in pg_size_bytes function. Against convert_to_base_unit, a string
+ * comparation is case insensitive.
+ */
+ bool
+ parse_memory_unit(const char *unit, int *multiplier,
+ const char **hintmsg)
+ {
+ int i;
+
+ for (i = 0; *memory_unit_conversion_table[i].unit; i++)
+ {
+ const unit_conversion *conv = &memory_unit_conversion_table[i];
+
+ if (conv->base_unit == GUC_UNIT_KB &&
+ strcasecmp(unit, conv->unit) == 0)
+ {
+ *multiplier = conv->multiplier;
+ return true;
+ }
+ }
+
+ *hintmsg = memory_units_hint;
+
+ return false;
+ }
+
+
+ /*
* Try to parse value as an integer. The accepted formats are the
* usual decimal, octal, or hexadecimal formats, optionally followed by
* a unit name if "flags" indicates a unit is allowed.
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index 79e92ff..5921bac
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DATA(insert OID = 2288 ( pg_size_pretty
*** 3587,3592 ****
--- 3587,3594 ----
DESCR("convert a long int to a human readable text using size units");
DATA(insert OID = 3166 ( pg_size_pretty PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 25 "1700" _null_ _null_ _null_ _null_ _null_ pg_size_pretty_numeric _null_ _null_ _null_ ));
DESCR("convert a numeric to a human readable text using size units");
+ DATA(insert OID = 3331 ( pg_size_bytes PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 20 "25" _null_ _null_ _null_ _null_ _null_ pg_size_bytes _null_ _null_ _null_ ));
+ DESCR("convert a size in human-readable format with size units into bytes");
DATA(insert OID = 2997 ( pg_table_size PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 20 "2205" _null_ _null_ _null_ _null_ _null_ pg_table_size _null_ _null_ _null_ ));
DESCR("disk space usage for the specified table, including TOAST, free space and visibility map");
DATA(insert OID = 2998 ( pg_indexes_size PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 20 "2205" _null_ _null_ _null_ _null_ _null_ pg_indexes_size _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index c2e529f..bb21615
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_relation_size(PG_FUNCTIO
*** 470,475 ****
--- 470,476 ----
extern Datum pg_total_relation_size(PG_FUNCTION_ARGS);
extern Datum pg_size_pretty(PG_FUNCTION_ARGS);
extern Datum pg_size_pretty_numeric(PG_FUNCTION_ARGS);
+ extern Datum pg_size_bytes(PG_FUNCTION_ARGS);
extern Datum pg_table_size(PG_FUNCTION_ARGS);
extern Datum pg_indexes_size(PG_FUNCTION_ARGS);
extern Datum pg_relation_filenode(PG_FUNCTION_ARGS);
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
new file mode 100644
index e1de1a5..3bfe0f4
*** a/src/include/utils/guc.h
--- b/src/include/utils/guc.h
*************** extern int NewGUCNestLevel(void);
*** 357,362 ****
--- 357,364 ----
extern void AtEOXact_GUC(bool isCommit, int nestLevel);
extern void BeginReportingGUCOptions(void);
extern void ParseLongOption(const char *string, char **name, char **value);
+ extern bool parse_memory_unit(const char *unit, int *multiplier,
+ const char **hintmsg);
extern bool parse_int(const char *value, int *result, int flags,
const char **hintmsg);
extern bool parse_real(const char *value, double *result);
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
new file mode 100644
index aa513e7..fe8ad30
*** a/src/test/regress/expected/dbsize.out
--- b/src/test/regress/expected/dbsize.out
***************
*** 1,3 ****
--- 1,6 ----
+ -- These functions shares memory_unit_conversion_table.
+ -- Currently only max TB unit is supported. When you increase
+ -- supported unit, update related documentation too.
SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
(VALUES (10::bigint), (1000::bigint), (1000000::bigint),
(1000000000::bigint), (1000000000000::bigint),
*************** SELECT size, pg_size_pretty(size), pg_si
*** 35,37 ****
--- 38,101 ----
1000000000000000.5 | 909 TB | -909 TB
(12 rows)
+ SELECT pg_size_bytes(size) FROM
+ (VALUES('1'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
+ ('1TB'), ('3000 TB')) x(size);
+ pg_size_bytes
+ ------------------
+ 1
+ 1024
+ 1048576
+ 1073741824
+ 1610612736
+ 1099511627776
+ 3298534883328000
+ (7 rows)
+
+ -- case insensitive units are supported
+ SELECT pg_size_bytes(size) FROM
+ (VALUES('1'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
+ ('1tb')) x(size);
+ pg_size_bytes
+ ---------------
+ 1
+ 1024
+ 1048576
+ 1073741824
+ 1610612736
+ 1099511627776
+ (6 rows)
+
+ -- negative numbers are supported
+ SELECT pg_size_bytes(size) FROM
+ (VALUES('-1'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
+ ('-1tb')) x(size);
+ pg_size_bytes
+ ----------------
+ -1
+ -1024
+ -1048576
+ -1073741824
+ -1610612736
+ -1099511627776
+ (6 rows)
+
+ --should fail
+ SELECT pg_size_bytes('1 AB');
+ ERROR: invalid unit: "AB"
+ HINT: Valid units for this parameter are "kB", "MB", "GB", and "TB".
+ SELECT pg_size_bytes('1 AB A');
+ ERROR: invalid unit: "AB A"
+ HINT: Valid units for this parameter are "kB", "MB", "GB", and "TB".
+ select pg_size_bytes('9223372036854775807.9');
+ ERROR: bigint out of range
+ select pg_size_bytes('1024 bytes');
+ ERROR: invalid unit: "bytes"
+ HINT: Valid units for this parameter are "kB", "MB", "GB", and "TB".
+ select pg_size_bytes('.+912');
+ ERROR: invalid size: ".+912"
+ select pg_size_bytes('+912+ kB');
+ ERROR: invalid unit: "+ kB"
+ HINT: Valid units for this parameter are "kB", "MB", "GB", and "TB".
+ select pg_size_bytes('++123 kB');
+ ERROR: invalid size: "++123 kB"
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
new file mode 100644
index c118090..91bd2e2
*** a/src/test/regress/sql/dbsize.sql
--- b/src/test/regress/sql/dbsize.sql
***************
*** 1,3 ****
--- 1,7 ----
+ -- These functions shares memory_unit_conversion_table.
+ -- Currently only max TB unit is supported. When you increase
+ -- supported unit, update related documentation too.
+
SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
(VALUES (10::bigint), (1000::bigint), (1000000::bigint),
(1000000000::bigint), (1000000000000::bigint),
*************** SELECT size, pg_size_pretty(size), pg_si
*** 10,12 ****
--- 14,40 ----
(10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
(1000000000.5::numeric), (1000000000000.5::numeric),
(1000000000000000.5::numeric)) x(size);
+
+ SELECT pg_size_bytes(size) FROM
+ (VALUES('1'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
+ ('1TB'), ('3000 TB')) x(size);
+
+ -- case insensitive units are supported
+ SELECT pg_size_bytes(size) FROM
+ (VALUES('1'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
+ ('1tb')) x(size);
+
+ -- negative numbers are supported
+ SELECT pg_size_bytes(size) FROM
+ (VALUES('-1'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
+ ('-1tb')) x(size);
+
+ --should fail
+ SELECT pg_size_bytes('1 AB');
+ SELECT pg_size_bytes('1 AB A');
+ select pg_size_bytes('9223372036854775807.9');
+ select pg_size_bytes('1024 bytes');
+
+ select pg_size_bytes('.+912');
+ select pg_size_bytes('+912+ kB');
+ select pg_size_bytes('++123 kB');
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers