Hi Alexey, On Fri, May 15, 2020 at 04:26:02PM +0400, Alexey Botchkov wrote: > See the next iteration here > https://github.com/MariaDB/server/commit/692cb566096d61b240ec26e84fc7d3c7d13f024c
> So the ha_json_table;:position() was implemented. > The size of the reference depends on the nested path depth - each level > adds some 9 bytes to the initial 5. > That can be decreased but i decided to keep it simple initially and i doubt > we're going to have really deep > nesting in realistic scenarios. This is ok. I'm not sure if the server has a limit on the rowid size. Perhaps there is, and in that case we just need to limit the depth we allow. > And i'd like to ask you for some testing > queries here. Or just the model how > to produce queries that will be using the ::position() extensively. I've provided a testcase for the position() call in my previous email. https://gist.github.com/spetrunia/a905d51731c58f5439bd9f70c64cdc43 As for rnd_pos(), one case that I'm aware of is when the query does a filesort, and the row being sorted either includes a blob, or has the total length exceeding certain limit. I've tried constructing an example with blobs, and it fails with an assertion before the execution reaches rnd_pos() calls: select * from json_table('[{"color": "blue", "price": 50}, {"color": "red", "price": 100}, {"color": "rojo", "price": 10.0}, {"color": "blanco", "price": 11.0}]', '$[*]' columns( color varchar(100) path '$.color', price text path '$.price', seq for ordinality ) ) as T order by color desc; fails an assertion: mysqld: /home/psergey/dev-git/10.5-json/sql/field.cc:8309: virtual int Field_blob::store(const char*, size_t, CHARSET_INFO*): Assertion `marked_for_write_or_computed()' failed. Once that is fixed, this should use position() and rnd_pos() calls. This will likely expose more issues with rnd_pos(), see my comments to the Json_table_nested_path::set_position below. == A problem with VIEWs == mysql> select * -> from -> json_table( -> '[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]', -> '$[*]' columns( color varchar(100) path '$.color')) as `T_A` where T_A.color<>'azul' ; +-------+ | color | +-------+ | red | | black | +-------+ 2 rows in set (6.34 sec) Good so far. Now, let's try creating a VIEW from this: create view v1 as select * from json_table( '[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]', '$[*]' columns( color varchar(100) path '$.color')) as `T_A` where T_A.color<>'azul' ; select * from v1; ERROR 4041 (HY000): Unexpected end of JSON path in argument 2 to function 'JSON_TABLE' Examining the .frm file, I see something that looks like garbage data: query=select `T_A`.`color` AS `color` from JSON_TABLE(\'[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]\', \'\'\0<D8>7^A<9C><FF>^?\0\0\0\0\0\0\0\0\0\0\0\0^A<A5><A5><A5><A5><A5>^H<D0>kWUU\0\0[ {"color": "red", "price": 1}, {"color": "black", "price": 2}]\0$[*]\'\' COLUMNS (`color` PATH \'^A<9C><FF>^?\0\0\0\0\0\0^A\0\0\0\0\0\0\0<A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5>\0\0\0\0<A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5><A5> <A5><A5><A5><A5><A5><A5> D^A<9C><FF>^?\0\0H:^A<9C><FF>^?\0\0<A5><A5><A5><A5><A5><A5><A5><A5>100\0<8F><8F><8F><8F>$.color\')) T_A where `T_A`.`color` <> \'azul\' EXPLAIN EXTENDED ...; SHOW WARNINGS; - also print something odd. == EXPLAIN [FORMAT=JSON] == I think EXPLAIN output should provide an indication that a table function is used. MySQL does it like so: <TODO> == Unneeded recursive rules in grammar == I've already complained about this: the grammar has recursive ON EMPTY, ON ERROR rules, which cause the following to be accepted (note the two ON EMPTY clauses) : select * from json_table('[{"color": "blue", "price": 50},{"color": "red"}]', '$[*]' columns( price varchar(255) path '$.price' default 'xyz' on empty default 'abc' on empty ) ) as T; > commit 692cb566096d61b240ec26e84fc7d3c7d13f024c > Author: Alexey Botchkov <holyf...@askmonty.org> > Date: Fri May 15 15:25:28 2020 +0400 > > MDEV-17399 Add support for JSON_TABLE. > > Syntax for JSON_TABLE added. > The ha_json_table handler added. Executing the JSON_TABLE we > create the temporary table of the ha_json_table, add dependencies > of other tables and sometimes conditions to WHERE. > and sometimes conditions to WHERE. I think this is not true anymore? ... > diff --git a/include/my_base.h b/include/my_base.h > index 7efa5eb9673..89ef3e8e7c1 100644 > --- a/include/my_base.h > +++ b/include/my_base.h > @@ -523,6 +523,13 @@ enum ha_base_keytype { > #define HA_ERR_TABLESPACE_MISSING 194 /* Missing Tablespace */ > #define HA_ERR_SEQUENCE_INVALID_DATA 195 > #define HA_ERR_SEQUENCE_RUN_OUT 196 > + > +/* > + Share the error code to not increment the HA_ERR_LAST for now, > + as it disturbs some storage engine's tests. > + Probably should be fixed later. > +*/ > +#define HA_ERR_INVALID_JSON HA_ERR_TABLE_IN_FK_CHECK We definitely can't have this in the final patch. We need to either: A. Use an engine-specific error code. Check out MyRocks and ha_rocksdb::get_error_message() for an example of how this is done B. Indeed introduce a generic "Invalid JSON" error code. I'm hesitant about B, let's discuss this with other developers. ... > diff --git a/sql/table_function.cc b/sql/table_function.cc > new file mode 100644 > index 00000000000..71f2378ce7d > --- /dev/null > +++ b/sql/table_function.cc ... > + > +class ha_json_table: public handler > +{ Please add comments about these > +protected: > + Table_function_json_table *m_jt; > + String m_tmps; > + String *m_js; > + uchar *m_cur_pos; > +public: > + ha_json_table(TABLE_SHARE *share_arg, Table_function_json_table *jt): > + handler(&table_function_hton.m_hton, share_arg), m_jt(jt) > + { > + /* > + set the mark_trx_read_write_done to avoid the > + handler::mark_trx_read_write_internal() call. > + It relies on &ha_thd()->ha_data[ht->slot].ha_info[0] to be set. > + But we don't set the ha_data for the ha_json_table, and > + that call makes no sence for ha_json_table. > + */ > + mark_trx_read_write_done= 1; > + ref_length= (jt->m_depth+1) * (4 + 1) + > + jt->m_depth * (sizeof(Json_table_nested_path *)); > + } > + ~ha_json_table() {} > + handler *clone(const char *name, MEM_ROOT *mem_root) { return NULL; } > + const char *index_type(uint inx) { return "NONE"; } > + /* Rows also use a fixed-size format */ > + enum row_type get_row_type() const { return ROW_TYPE_FIXED; } > + ulonglong table_flags() const > + { > + return (HA_FAST_KEY_READ | /*HA_NO_BLOBS |*/ HA_NULL_IN_KEY | > + HA_CAN_SQL_HANDLER | > + HA_REC_NOT_IN_SEQ | HA_NO_TRANSACTIONS | > + HA_HAS_RECORDS | HA_CAN_HASH_KEYS); > + } > + ulong index_flags(uint inx, uint part, bool all_parts) const > + { > + return HA_ONLY_WHOLE_INDEX | HA_KEY_SCAN_NOT_ROR; > + } > + uint max_supported_keys() const { return 1; } > + uint max_supported_key_part_length() const { return MAX_KEY_LENGTH; } > + double scan_time() { return 1000000.0; } > + double read_time(uint index, uint ranges, ha_rows rows) { return 0.0; } The above two functions are never called. Please * add a comment saying that. * add a DBUG_ASSERT() into them to back the point made by the comment :-) > + int open(const char *name, int mode, uint test_if_locked); > + int close(void) { return 0; } > + int rnd_init(bool scan); > + int rnd_next(uchar *buf); > + int rnd_pos(uchar * buf, uchar *pos); > + void position(const uchar *record); > + int can_continue_handler_scan() { return 1; } > + int info(uint); > + int extra(enum ha_extra_function operation); > + THR_LOCK_DATA **store_lock(THD *thd, THR_LOCK_DATA **to, > + enum thr_lock_type lock_type) > + { return NULL; } > + int create(const char *name, TABLE *form, HA_CREATE_INFO *create_info) > + { return 1; } > +private: > + void update_key_stats(); > +}; > + ... > +void Json_table_nested_path::set_position(const char *j_start, const uchar > *pos) > +{ * This needs a comment. * I don't see where the value of m_ordinality_counter is restored? * The same about m_null - its value is not restored either? > + if (m_nested) > + { > + memcpy(&m_cur_nested, pos, sizeof(m_cur_nested)); > + pos+= sizeof(m_cur_nested); > + } > + > + m_engine.s.c_str= (uchar *) j_start + sint4korr(pos); > + m_engine.state= (int) pos[4]; > + if (m_cur_nested) > + m_cur_nested->set_position(j_start, pos); > +} > + ... > +int ha_json_table::info(uint) > +{ > + /* We don't want 0 or 1 in stats.records. */ > + stats.records= 4; > + return 0; Does this value matter? As far as I understand it doesn't, as the optimizer will use the estimates obtained from Table_function_json_table::get_estimates. Please add a comment about this. > +} > + ... Please document this function. What's last_column? Why does print() method get it and return it? > +int Json_table_nested_path::print(THD *thd, TABLE_LIST *sql_table, String > *str, > + List_iterator_fast<Json_table_column> &it, > + Json_table_column **last_column) > +{ > + Json_table_nested_path *c_path= this; ... > diff --git a/sql/table_function.h b/sql/table_function.h > new file mode 100644 > index 00000000000..49d650cb7b2 > --- /dev/null > +++ b/sql/table_function.h > @@ -0,0 +1,168 @@ ... > +#include <json_lib.h> > + > +/* > + The Json_table_nested_path represents the 'current nesting' level > + for a set of JSON_TABLE columns. > + Each column (Json_table_column instance) is linked with corresponding > + 'nested path' object and gets it's piece of JSON to parse during the > computation > + phase. > + The root 'nested_path' is always present as a part of > Table_function_json_table, > + then other 'nested_paths' can be created and linked into a tree structure > when new > + 'NESTED PATH' is met. The nested 'nested_paths' are linked with > 'm_nested', the same-level > + 'nested_paths' are linked with 'm_next_nested'. > + So for instance > + JSON_TABLE( '...', '$[*]' > + COLUMNS( a INT PATH '$.a' , > + NESTED PATH '$.b[*]' COLUMNS (b INT PATH '$', > + NESTED PATH '$.c[*]' COLUMNS(x INT > PATH '$')), > + NESTED PATH '$.n[*]' COLUMNS (z INT PAHT '$')) > + results in 4 'nested_path' created: > + root nested_b nested_c nested_n > + m_path '$[*]' '$.b[*]' '$.c[*]' '$.n[*] > + m_nested &nested_b &nested_c NULL NULL > + n_next_nested NULL &nested_n NULL NULL > + > +and 4 columns created: > + a b x z > + m_nest &root &nested_b &nested_c &nested_n > +*/ > + > + > +class Json_table_column; > + > +class Json_table_nested_path : public Sql_alloc > +{ > +public: > + bool m_null; > + json_path_t m_path; > + json_engine_t m_engine; > + json_path_t m_cur_path; > + > + /* Counts the rows produced. Value is set to the FOR ORDINALITY coluns */ > + longlong m_ordinality_counter; > + > + Json_table_nested_path *m_parent; > + Json_table_nested_path *m_nested, *m_next_nested; > + Json_table_nested_path **m_nested_hook; > + Json_table_nested_path *m_cur_nested; Please add documentation about the above members. I see the diagram above, but I think text descriptions are also needed for each member. What's m_nested_hook? > + Json_table_nested_path(Json_table_nested_path *parent_nest): > + m_parent(parent_nest), m_nested(0), m_next_nested(0), > + m_nested_hook(&m_nested) {} > + int set_path(THD *thd, const LEX_CSTRING &path); > + void scan_start(CHARSET_INFO *i_cs, const uchar *str, const uchar *end); > + int scan_next(); > + int print(THD *thd, TABLE_LIST *sql_table, String *str, > + List_iterator_fast<Json_table_column> &it, > + Json_table_column **last_column); > + void get_current_position(const char *j_start, uchar *pos) const; > + void set_position(const char *j_start, const uchar *pos); > +}; ... > +class Table_function_json_table : public Sql_alloc Please document the class and the data members. > +{ > +public: > + Item *m_json; > + Json_table_nested_path m_nested_path; > + List<Json_table_column> m_columns; > + table_map m_dep_tables; > + uint m_depth, m_cur_depth; > + > + Table_function_json_table(Item *json): m_json(json), m_nested_path(0), > + m_depth(0), m_cur_depth(0) {} > + > + /* > + Used in sql_yacc.yy. > + Represents the current NESTED PATH level being parsed. > + */ > + Json_table_nested_path *m_sql_nest; > + void add_nested(Json_table_nested_path *np); > + void leave_nested(); BR Sergei -- Sergei Petrunia, Software Developer MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp