Re: [HACKERS] custom function for converting human readable sizes to bytes

Pavel Stehule Wed, 20 Jan 2016 10:45:50 -0800

2016-01-19 4:45 GMT+01:00 Vitaly Burovoy <[email protected]>:

> On 1/4/16, Pavel Stehule <[email protected]> wrote:
> > 2016-01-04 18:13 GMT+01:00 Shulgin, Oleksandr <
> [email protected]> :
> >> On Mon, Jan 4, 2016 at 6:03 PM, Pavel Stehule <[email protected]>
> wrote:
> >> > 2016-01-04 17:48 GMT+01:00 Shulgin, Oleksandr <
> [email protected]>:
> >> >> On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <[email protected]>
> wrote:
> >> >>
> >> >> I'm also inclined on dropping that explicit check for empty string
> >> >> below and let numeric_in() error out on that.  Does this look OK, or
> can
> >> >> it confuse someone:
> >> >> postgres=# select pg_size_bytes('');
> >> >> ERROR:  invalid input syntax for type numeric: ""
> >> >
> >> > both fixed
> >>
> >> Hm...
> >>
> >> > + switch (*strptr)
> >> > + {
> >> > + /* ignore plus symbol */
> >> > + case '+':
> >> > + case '-':
> >> > + *bufptr++ = *strptr++;
> >> > + break;
> >> > + }
> >>
> >> Well, to that amount you don't need any special checks, I'm just not
> sure
> >> if reported error message is not misleading if we let numeric_in()
> handle
> >> all the errors.  At least it can cope with the leading spaces, +/- and
> >> empty input quite well.
> >>
> >
> > I don't would to catch a exception from numeric_in - so I try to solve
> some
> > simple situations, where I can raise possible better error message.
>
> There are several cases where your behavior gives strange errors (see
> below).
>
> Next batch of notes:
>
> src/include/catalog/pg_proc.h:
> ---
> + DATA(insert OID = 3317 ( pg_size_bytes...
> now oid 3317 is used (by pg_stat_get_wal_receiver), 3318 is free
>


fixed


>
> ---
> + DESCR("convert a human readable text with size units to big int bytes");
> May be the best way is to copy the first sentence from the doc?
> ("convert a size in human-readable format with size units into bytes")
>

fixed


> ====
> src/backend/utils/adt/dbsize.c:
> + text             *arg = PG_GETARG_TEXT_PP(0);
> + char             *str = text_to_cstring(arg);
> ...
> +       /* working buffer cannot be longer than original string */
> +       buffer = (char *) palloc(VARSIZE_ANY_EXHDR(arg) + 1);
> Is there any reason to get TEXT for only converting it to cstring and
> call VARSIZE_ANY_EXHDR instead of strlen?
>

performance - but these strings should be short, so I can use strlen

fixed


>
> ---
> +       text               *arg = PG_GETARG_TEXT_PP(0);
> +       char               *str = text_to_cstring(arg);
> +       char    *strptr = str;
> +       char               *buffer;
> There are wrong offsets of variable names after their types (among all
> body of the "pg_size_bytes" function).
> See variable declarations in nearby functions (
>   >> "make the new code look like the existing code around it"
>   http://www.postgresql.org/docs/devel/static/source-format.html
> )
>
>
fixed


> ---
> +                                        errmsg("\"%s\" is not number",
> str)));
> s/is not number/is not a number/
> (the second version can be found in a couple places besides translations)
>

fixed

but this message can be little bit no intuitive - better text is "is not a
valid number"


>
> ---
> +       if (*strptr != '\0')
> ...
> +               while (*strptr && !isspace(*strptr))
> Sometimes it explicitly compares to '\0', sometimes implicitly.
> Common use is explicit comparison and it is preferred due to different
> compilers (their conversions to boolean).
>

fixed

>
> ---
> +       /* Skip leading spaces */
> ...
> +               /* ignore plus symbol */
> ...
> +       /* copy digits to working buffer */
> ...
> +       /* allow whitespace between integer and unit */
> I'm also inclined on dropping that explicit skipping spaces, checking
> for +/- symbols, but copying all digits, spaces, dots and '+-' symbols
> and let numeric_in() error out on that.
>

This is difficult - you have to divide string to two parts and first world
is number, second world is unit.

For example "+912+ kB" is correct number +912 and broken unit "+ kB".


> It allows to get correct error messages for something like:
> postgres=# select pg_size_bytes('.+912');
> ERROR:  invalid input syntax for type numeric: ".+912"
> postgres=# select pg_size_bytes('+912+ kB');
> ERROR:  invalid input syntax for type numeric: "+912+ "
> postgres=# select pg_size_bytes('++123 kB');
> ERROR:  invalid input syntax for type numeric: "++123 "
>
> instead of current:
> postgres=# select pg_size_bytes('.+912');
> ERROR:  invalid input syntax for type numeric: "."
> postgres=# select pg_size_bytes('+912+ kB');
> ERROR:  invalid unit: "+ kB"
> postgres=# select pg_size_bytes('++123 kB');
> ERROR:  invalid input syntax for type numeric: "+"
>
>
I redesigned this check. Now it is

popostgres=# select pg_size_bytes('.+912');
ERROR:  22023: ".+912" is not a valid number
stgres=# select pg_size_bytes('++123 kB');
ERROR:  22023: "++123 kB" is not a valid number



> ---
> +       while (isspace((unsigned char) *strptr))
> ...
> +       while (isspace(*strptr))
> ...
> +               while (*strptr && !isspace(*strptr))
> ...
> +               while (isspace(*strptr))
> The first occurece of isspace's parameter is casting to "unsigned
> char" whereas the others are not.
> Note:
> "The behavior is undefined if the value of ch is not representable as
> unsigned char and is not equal to EOF"
>




> Proof:
> http://en.cppreference.com/w/c/string/byte/isspace


fixed



>
>
> ---
> +       pfree(buffer);
> +       pfree(str);
> pfree-s here are not necessary. See:
> http://www.neilconway.org/talks/hacking/hack_slides.pdf (page 17)
>

Automatic memory deallocation doesn't cover all possible situations where
the function can be used - for example DirectFunctionCall - so explicit
deallocation can descrease a memory requirements when you call these
functions from C.

New version is attached

Regards

Pavel


>
> --
> Best regards,
> Vitaly Burovoy
>

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 0af01d9..c349714
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*************** postgres=# SELECT * FROM pg_xlogfile_nam
*** 17655,17660 ****
--- 17655,17663 ----
      <primary>pg_relation_size</primary>
     </indexterm>
     <indexterm>
+     <primary>pg_size_bytes</primary>
+    </indexterm>
+    <indexterm>
      <primary>pg_size_pretty</primary>
     </indexterm>
     <indexterm>
*************** postgres=# SELECT * FROM pg_xlogfile_nam
*** 17725,17730 ****
--- 17728,17744 ----
         </entry>
        </row>
        <row>
+        <entry>
+         <literal><function>pg_size_bytes(<type>text</type>)</function></literal>
+         </entry>
+        <entry><type>bigint</type></entry>
+        <entry>
+          Converts a size in human-readable format with size units into bytes.
+          The parameter is case-insensitive, and the supported size units
+          are: kB, MB, GB and TB.
+        </entry>
+       </row>
+       <row>
         <entry>
          <literal><function>pg_size_pretty(<type>bigint</type>)</function></literal>
          </entry>
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
new file mode 100644
index 2084692..643004a
*** a/src/backend/utils/adt/dbsize.c
--- b/src/backend/utils/adt/dbsize.c
***************
*** 25,30 ****
--- 25,31 ----
  #include "storage/fd.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
+ #include "utils/guc.h"
  #include "utils/numeric.h"
  #include "utils/rel.h"
  #include "utils/relfilenodemap.h"
*************** pg_size_pretty_numeric(PG_FUNCTION_ARGS)
*** 700,705 ****
--- 701,847 ----
  }
  
  /*
+  * Convert human readable size to long int.
+  *
+  * Due to support for decimal values and case insensitivity of units
+  * a function parse_int cannot be used.
+  */
+ Datum
+ pg_size_bytes(PG_FUNCTION_ARGS)
+ {
+ 	text		*arg = PG_GETARG_TEXT_PP(0);
+ 	char		*str,
+ 				*strptr;
+ 	char		*buffer,
+ 				*bufptr;
+ 	Numeric		num;
+ 	int64		result;
+ 	bool		found_digit = false;
+ 	bool		found_point = false;
+ 
+ 	str = text_to_cstring(arg);
+ 	strptr = str;
+ 
+ 	/* prepare buffer for parts of parsed input string */
+ 	buffer = (char *) palloc(strlen(str) + 1);
+ 	bufptr = buffer;
+ 
+ 	/* Skip leading spaces */
+ 	while (isspace((unsigned char) *strptr))
+ 		strptr++;
+ 
+ 	switch (*strptr)
+ 	{
+ 		/* ignore plus symbol */
+ 		case '+':
+ 		case '-':
+ 			*bufptr++ = *strptr++;
+ 			break;
+ 	}
+ 
+ 	while(*strptr != '\0')
+ 	{
+ 		if (isdigit((unsigned char) *strptr))
+ 		{
+ 			*bufptr++ = *strptr++;
+ 			found_digit = true;
+ 		}
+ 		else if (*strptr == '.')
+ 		{
+ 			if (found_point)
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("\"%s\" is not in expected format", str)));
+ 
+ 			*bufptr++ = *strptr++;
+ 			found_point = true;
+ 		}
+ 		else
+ 			/* any non digit char, the unit is starting */
+ 			break;
+ 	}
+ 
+ 	if (!found_digit)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("\"%s\" is not in expected format", str)));
+ 
+ 	*bufptr = '\0';
+ 
+ 	/* don't allow empty string */
+ 	if (*buffer == '\0')
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("\"%s\" is not in expected format", str)));
+ 
+ 	num = DatumGetNumeric(DirectFunctionCall3(numeric_in,
+ 							CStringGetDatum(buffer), 0, -1));
+ 
+ 	/* allow whitespace between integer and unit */
+ 	while (isspace((unsigned char) *strptr))
+ 		strptr++;
+ 
+ 	/* Handle possible unit */
+ 	if (*strptr != '\0')
+ 	{
+ 		int		multiplier;
+ 		Numeric		mul_num;
+ 		const char	*hintmsg;
+ 		const char	*unitstr;
+ 
+ 		bufptr = buffer;
+ 
+ 		/*
+ 		 * unitstr holds complete string related to unit part. Used
+ 		 * as part of error message. The parse_memory_unit is called
+ 		 * with this value, when we detected invalid format, and we
+ 		 * would to emit error result to ger hintstr. Elsewhere is
+ 		 * used space trimmed buffer.
+ 		 */
+ 		unitstr = strptr;
+ 
+ 		/* copy chars to buffer and stop on first space or end string */
+ 		while (*strptr != '\0' && !isspace((unsigned char) *strptr))
+ 			*bufptr++ = *strptr++;
+ 
+ 		*bufptr = '\0';
+ 
+ 		/* We allow ending spaces. Skip all spaces. */
+ 		while (isspace((unsigned char) *strptr))
+ 			strptr++;
+ 
+ 		/* Use buffer as unitstr, when we didn't find more words. */
+ 		if (*strptr == '\0')
+ 			unitstr = buffer;
+ 
+ 		/* Still, the unit can be invalid: too long, unknown */
+ 		if (!parse_memory_unit(unitstr, &multiplier, &hintmsg))
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("invalid unit: \"%s\"", unitstr),
+ 					 errhint("%s", _(hintmsg))));
+ 
+ 		/*
+ 		 * Now, the multiplier is in KB units. It should be multiplied by 1024
+ 		 * before usage.
+ 		 */
+ 		mul_num = DatumGetNumeric(DirectFunctionCall1(int8_numeric,
+ 						    Int64GetDatum(multiplier * 1024L)));
+ 
+ 		num = DatumGetNumeric(DirectFunctionCall2(numeric_mul,
+ 							    NumericGetDatum(mul_num),
+ 							    NumericGetDatum(num)));
+ 	}
+ 
+ 	result = DatumGetInt64(DirectFunctionCall1(numeric_int8, NumericGetDatum(num)));
+ 
+ 	pfree(buffer);
+ 	pfree(str);
+ 
+ 	PG_RETURN_INT64(result);
+ }
+ 
+ /*
   * Get the filenode of a relation
   *
   * This is expected to be used in queries like
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
new file mode 100644
index 38ba82f..4020df7
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
*************** convert_from_base_unit(int64 base_value,
*** 5238,5243 ****
--- 5238,5272 ----
  
  
  /*
+  * Parse value as some known memory unit to their size in bytes.
+  * Used in pg_size_bytes function. Against convert_to_base_unit, a string
+  * comparation is case insensitive.
+  */
+ bool
+ parse_memory_unit(const char *unit, int *multiplier,
+ 					    const char **hintmsg)
+ {
+ 	int			i;
+ 
+ 	for (i = 0; *memory_unit_conversion_table[i].unit; i++)
+ 	{
+ 		const unit_conversion *conv = &memory_unit_conversion_table[i];
+ 
+ 		if (conv->base_unit == GUC_UNIT_KB &&
+ 			strcasecmp(unit, conv->unit) == 0)
+ 		{
+ 			*multiplier = conv->multiplier;
+ 			return true;
+ 		}
+ 	}
+ 
+ 	*hintmsg = memory_units_hint;
+ 
+ 	return false;
+ }
+ 
+ 
+ /*
   * Try to parse value as an integer.  The accepted formats are the
   * usual decimal, octal, or hexadecimal formats, optionally followed by
   * a unit name if "flags" indicates a unit is allowed.
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index 244aa4d..839934a
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DATA(insert OID = 2286 ( pg_total_relati
*** 3567,3572 ****
--- 3567,3574 ----
  DESCR("total disk space usage for the specified table and associated indexes");
  DATA(insert OID = 2288 ( pg_size_pretty			PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 25 "20" _null_ _null_ _null_ _null_ _null_ pg_size_pretty _null_ _null_ _null_ ));
  DESCR("convert a long int to a human readable text using size units");
+ DATA(insert OID = 3331 ( pg_size_bytes			PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 20 "25" _null_ _null_ _null_ _null_ _null_ pg_size_bytes _null_ _null_ _null_ ));
+ DESCR("convert a size in human-readable format with size units into bytes");
  DATA(insert OID = 3166 ( pg_size_pretty			PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 25 "1700" _null_ _null_ _null_ _null_ _null_ pg_size_pretty_numeric _null_ _null_ _null_ ));
  DESCR("convert a numeric to a human readable text using size units");
  DATA(insert OID = 2997 ( pg_table_size			PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 20 "2205" _null_ _null_ _null_ _null_ _null_ pg_table_size _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
new file mode 100644
index 477fde1..023d6a2
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum pg_relation_size(PG_FUNCTIO
*** 462,467 ****
--- 462,468 ----
  extern Datum pg_total_relation_size(PG_FUNCTION_ARGS);
  extern Datum pg_size_pretty(PG_FUNCTION_ARGS);
  extern Datum pg_size_pretty_numeric(PG_FUNCTION_ARGS);
+ extern Datum pg_size_bytes(PG_FUNCTION_ARGS);
  extern Datum pg_table_size(PG_FUNCTION_ARGS);
  extern Datum pg_indexes_size(PG_FUNCTION_ARGS);
  extern Datum pg_relation_filenode(PG_FUNCTION_ARGS);
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
new file mode 100644
index e1de1a5..3bfe0f4
*** a/src/include/utils/guc.h
--- b/src/include/utils/guc.h
*************** extern int	NewGUCNestLevel(void);
*** 357,362 ****
--- 357,364 ----
  extern void AtEOXact_GUC(bool isCommit, int nestLevel);
  extern void BeginReportingGUCOptions(void);
  extern void ParseLongOption(const char *string, char **name, char **value);
+ extern bool parse_memory_unit(const char *unit, int *multiplier,
+ 					    const char **hintmsg);
  extern bool parse_int(const char *value, int *result, int flags,
  		  const char **hintmsg);
  extern bool parse_real(const char *value, double *result);
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
new file mode 100644
index aa513e7..aa652fb
*** a/src/test/regress/expected/dbsize.out
--- b/src/test/regress/expected/dbsize.out
*************** SELECT size, pg_size_pretty(size), pg_si
*** 35,37 ****
--- 35,98 ----
   1000000000000000.5 | 909 TB         | -909 TB
  (12 rows)
  
+ SELECT pg_size_bytes(size) FROM
+    (VALUES('1'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
+           ('1TB'), ('3000 TB')) x(size);
+   pg_size_bytes   
+ ------------------
+                 1
+              1024
+           1048576
+        1073741824
+        1610612736
+     1099511627776
+  3298534883328000
+ (7 rows)
+ 
+ -- case insensitive units are supported
+ SELECT pg_size_bytes(size) FROM
+    (VALUES('1'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
+           ('1tb')) x(size);
+  pg_size_bytes 
+ ---------------
+              1
+           1024
+        1048576
+     1073741824
+     1610612736
+  1099511627776
+ (6 rows)
+ 
+ -- negative numbers are supported
+ SELECT pg_size_bytes(size) FROM
+    (VALUES('-1'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
+           ('-1tb')) x(size);
+  pg_size_bytes  
+ ----------------
+              -1
+           -1024
+        -1048576
+     -1073741824
+     -1610612736
+  -1099511627776
+ (6 rows)
+ 
+ --should fail
+ SELECT pg_size_bytes('1 AB');
+ ERROR:  invalid unit: "AB"
+ HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".
+ SELECT pg_size_bytes('1 AB A');
+ ERROR:  invalid unit: "AB A"
+ HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".
+ select pg_size_bytes('9223372036854775807.9');
+ ERROR:  bigint out of range
+ select pg_size_bytes('1024 bytes');
+ ERROR:  invalid unit: "bytes"
+ HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".
+ select pg_size_bytes('.+912');
+ ERROR:  ".+912" is not in expected format
+ select pg_size_bytes('+912+ kB');
+ ERROR:  invalid unit: "+ kB"
+ HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".
+ select pg_size_bytes('++123 kB');
+ ERROR:  "++123 kB" is not in expected format
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
new file mode 100644
index c118090..8e8c3af
*** a/src/test/regress/sql/dbsize.sql
--- b/src/test/regress/sql/dbsize.sql
*************** SELECT size, pg_size_pretty(size), pg_si
*** 10,12 ****
--- 10,36 ----
              (10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
              (1000000000.5::numeric), (1000000000000.5::numeric),
              (1000000000000000.5::numeric)) x(size);
+ 
+ SELECT pg_size_bytes(size) FROM
+    (VALUES('1'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
+           ('1TB'), ('3000 TB')) x(size);
+ 
+ -- case insensitive units are supported
+ SELECT pg_size_bytes(size) FROM
+    (VALUES('1'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
+           ('1tb')) x(size);
+ 
+ -- negative numbers are supported
+ SELECT pg_size_bytes(size) FROM
+    (VALUES('-1'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
+           ('-1tb')) x(size);
+ 
+ --should fail
+ SELECT pg_size_bytes('1 AB');
+ SELECT pg_size_bytes('1 AB A');
+ select pg_size_bytes('9223372036854775807.9');
+ select pg_size_bytes('1024 bytes');
+ 
+ select pg_size_bytes('.+912');
+ select pg_size_bytes('+912+ kB');
+ select pg_size_bytes('++123 kB');

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] custom function for converting human readable sizes to bytes

Reply via email to