Hi, Alexander!
See below.
My main concern is the stability of the charset API.
I'd like it to be moved into a service, as plugins surely need it
(not "will need", various plugins need it even now). But you change it
quite often, so it would make a rather volatile service :(
> diff --git a/sql/sql_string.cc b/sql/sql_string.cc
> index 9fb462e..a0b6395 100644
> --- a/sql/sql_string.cc
> +++ b/sql/sql_string.cc
> @@ -922,8 +922,8 @@ String *copy_if_not_alloced(String *to,String
> *from,uint32 from_length)
> my_charset_same(from_cs, to_cs))
> {
> m_cannot_convert_error_pos= NULL;
> - return to_cs->cset->copy_abort(to_cs, to, to_length, from, from_length,
> - nchars, this);
> + return to_cs->cset->copy_fix(to_cs, to, to_length, from, from_length,
> + nchars, this);
Now you don't use copy_abort() anywhere.
and you've added copy_abort with the comment "A preparatory patch for MDEV-6566"
I believe you need to remove it now, may be revert the whole
commit b1b6101a
> }
> else
> {
> diff --git a/include/m_ctype.h b/include/m_ctype.h
> index f08efb4..4fa8779 100644
> --- a/include/m_ctype.h
> +++ b/include/m_ctype.h
> @@ -444,7 +444,55 @@ struct my_charset_handler_st
> size_t (*scan)(CHARSET_INFO *, const char *b, const char *e,
> int sq);
>
> - /* Copying routines */
> + /* String copying routines and helpers for them */
> + /*
> + charlen() - calculate length of the left-most character in bytes.
> + @param cs Character set
> + @param str The beginning of the string
> + @param end The end of the string
> +
> + @return MY_CS_ILSEQ if a bad byte sequence was found.
> + @return MY_CS_TOOSMALLN(x) if the string ended unexpectedly.
> + @return a positive number in the range 1..mbmaxlen,
> + if a valid character was found.
> + */
> + int (*charlen)(CHARSET_INFO *cs, const uchar *str, const uchar *end);
Why couldn't you use mbcharlen for that?
I mean, why not to add validity checks to mbcharlen?
or mb_wc
> + /*
> + well_formed_char_length() - returns character length of a string.
> +
> + @param cs Character set
> + @param str The beginning of the string
> + @param end The end of the string
> + @param nchars Not more than "nchars" left-most characters are
> checked.
> + @param status[OUT] Additional statistics is returned here.
> + "status" can be uninitialized before the call,
> + and it is fully initialized after the call.
> +
> + status->m_source_end_pos is set to the position where reading stopped.
> +
> + If a bad byte sequence is found, the function returns immediately and
> + status->m_well_formed_error_pos is set to the position where a bad byte
> + sequence was found.
> +
> + status->m_well_formed_error_pos is set to NULL if no bad bytes were
> found.
> + If status->m_well_formed_error_pos is NULL after the call, that means:
> + - either the function reached the end of the string,
> + - or all "nchars" characters were read.
> + The caller can check status->m_source_end_pos to detect which of these
> two
> + happened.
> + */
> + size_t (*well_formed_char_length)(CHARSET_INFO *cs,
> + const char *str, const char *end,
> + size_t nchars,
> + MY_STRCOPY_STATUS *status);
What's the difference with numchars?
> +
> + /*
> + copy_fix() - copy a string like copy_abort(), but fix bad bytes to '?'.
> + */
> + size_t (*copy_fix)(CHARSET_INFO *,
> + char *dst, size_t dst_length,
> + const char *src, size_t src_length,
> + size_t nchars, MY_STRCOPY_STATUS *status);
> /*
> copy_abort() - copy a string, abort if a bad byte sequence was found.
> Not more than "nchars" characters are copied.
> diff --git a/strings/ctype-simple.c b/strings/ctype-simple.c
> index b010c52..35ae191 100644
> --- a/strings/ctype-simple.c
> +++ b/strings/ctype-simple.c
> @@ -1108,6 +1115,19 @@ size_t my_well_formed_len_8bit(CHARSET_INFO *cs
> __attribute__((unused)),
> }
>
>
> +size_t
> +my_well_formed_char_length_8bit(CHARSET_INFO *cs __attribute__((unused)),
Try to put the type and the function name on one line.
As I've noticed yesterday, git diff doesn't recognizes it as a function
name otherwise
> + const char *start, const char *end,
> + size_t nchars, MY_STRCOPY_STATUS *status)
> +{
> + size_t nbytes= (size_t) (end - start);
> + size_t res= MY_MIN(nbytes, nchars);
> + status->m_well_formed_error_pos= NULL;
> + status->m_source_end_pos= start + res;
> + return res;
> +}
Regards,
Sergei
_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to : [email protected]
Unsubscribe : https://launchpad.net/~maria-developers
More help : https://help.launchpad.net/ListHelp