Hi Sergei! Users could benefit from this fix of MDEV-18734 since the last changes. Do you really think the open questions left are so important for this bug to be still not pushed?
On Tue, Jan 26, 2021 at 1:51 AM Aleksey Midenkov <[email protected]> wrote: > > Hi Sergei, > > On Tue, Dec 22, 2020 at 7:58 PM Sergei Golubchik <[email protected]> wrote: > > > > Hi, Aleksey! > > > > On Nov 03, Aleksey Midenkov wrote: > > > > > > the updated patches are here: > > > > > > https://github.com/MariaDB/server/commits/5f6bcd6cd0145513974b0dfc5b2cefba28b72f1a > > > > I started writing my comments there, but realized that it'd be clearer > > if I reply here, more context. That is, see below. > > > > One comment about the new test case, you have 16K lines in the result > > file, are all 16K necessary? Can you do something like > > > > select x, left(b,10), left(v,10) > > > > ? > > Yes, these shorter result cases reproduce as well. Updated. > > > > > And please, pretty please, don't use non-constant references as function > > arguments. > > Why? With pointers we can pass NULL and we don't need NULL there, do we? > > > > > > On Mon, Nov 2, 2020 at 11:33 PM Sergei Golubchik <[email protected]> wrote: > > > > > > > > > diff --git a/sql/field.cc b/sql/field.cc > > > > > index bdaaecc20269..0fd40c979d2c 100644 > > > > > --- a/sql/field.cc > > > > > +++ b/sql/field.cc > > > > > @@ -8310,6 +8310,7 @@ int Field_blob::store(const char *from,uint > > > > > length,CHARSET_INFO *cs) > > > > > copy_length= copier.well_formed_copy(field_charset, > > > > > (char*) value.ptr(), > > > > > new_length, > > > > > cs, from, length); > > > > > + value.length(copy_length); > > > > > > > > good! Could this be a distinct bug with its own test case? > > > > > > Do we need to spend time on that bureaucracy? There are real problems to > > > solve... Tickets drain time from the management people and generally from > > > anyone who sees them. > > > > I didn't mean opening an MDEV. Only writing a test case and extracting > > it into a separate commit. > > > > If this is a genuine bug fix and you don't create a test case - how can > > you be sure the fix will stay? It may disappear in a merge or a rebase, > > manually or automatically. Or in a refactoring that "looks fine, all > > tests pass". If you can create a test case - please, do. > > It is tested by federated.federated_partition test case. Why do we > have to extract this side-fix to a separate commit? Looks anyway as a > bureaucracy to me. The main bug is critical, the test case tests both > bugs nicely. If the partitioning code is refactored, the test case > still tests that side-fix. > > > > > > > > Field_blob::store_length(copy_length); > > > > > bmove(ptr+packlength,(uchar*) &tmp,sizeof(char*)); > > > > > > > > > > diff --git a/sql/ha_partition.cc b/sql/ha_partition.cc > > > > > index 9d82df0235c7..7c658082d397 100644 > > > > > --- a/sql/ha_partition.cc > > > > > +++ b/sql/ha_partition.cc > > > > > @@ -5125,14 +5128,14 @@ bool > > > > > ha_partition::init_record_priority_queue() > > > > > uint alloc_len; > > > > > uint used_parts= bitmap_bits_set(&m_part_info->read_partitions); > > > > > /* Allocate record buffer for each used partition. */ > > > > > - m_priority_queue_rec_len= m_rec_length + PARTITION_BYTES_IN_POS; > > > > > + m_priority_queue_rec_len= m_rec_length + ORDERED_REC_OFFSET; > > > > > if (!m_using_extended_keys) > > > > > m_priority_queue_rec_len += m_file[0]->ref_length; > > > > > alloc_len= used_parts * m_priority_queue_rec_len; > > > > > /* Allocate a key for temporary use when setting up the scan. */ > > > > > alloc_len+= table_share->max_key_length; > > > > > > > > > > - if (!(m_ordered_rec_buffer= (uchar*)my_malloc(alloc_len, > > > > > MYF(MY_WME)))) > > > > > + if (!(m_ordered_rec_buffer= (uchar*) alloc_root(&m_ordered_root, > > > > > alloc_len))) > > > > > > > > I don't see why you need a memroot here. memroot is needed when you > > > > allocate later at difeerent points in time some initially unpredictable > > > > amount of memory that has the same lifetime and needs to be > > > > freed all at once. > > > > > > I know... This is just another convenient way of allocation with > > > presumably > > > minimized number of mallocs (probably pre_alloc_size should be tweaked to > > > a > > > better value). > > > > memroot wasn't created for that. There are approaches with much less > > overhead > > Using memory pools like that is a nice design pattern for non-critical > sections. At least it is nicer than manual pointer management. I > believe I've seen the similar pattern in InnoDB with its memory pool > mem_heap_t even without preallocation! It would be nice to have a > special class that preallocates and manages pointers, but I do not see > the reason to develop that for this specific use case. > > I don't believe in *much* overhead: mem_root is a simplest algorithm, > a couple of ifs and one loop. Initialization call > init_record_priority_queue() is not a critical section, the major > execution is mostly busy by a data copy in > handle_ordered_index_scan(), handle_ordered_next(). > > Let's see gprof stats collected for 100k SELECTs attached in stats.txt > > init_record_priority_queue() takes 0.3%; handle_ordered_index_scan() > takes 2.3%; handle_ordered_next() takes 12.3%. > > Let's see unpatched stats for the same 100k SELECTs in stats_vanilla.txt > > init_record_priority_queue() takes 0.1%; handle_ordered_index_scan() > takes 2.2%; handle_ordered_next() takes 11.4%. > > My version of init_record_priority_queue() lost 0.2%. From details we > see that mem_root took 6 out of 13 which is 0.15%. > > One should keep in mind this is an artificial usage of the prio queue. > In diversified real-world usage the share of such kind SELECT is say > 0.1%. But even if it is 1% we divide our stats by 100. So, do you > really care about 0.0015%? > > > > > > > > > > > But here it seems that you only allocate once. You could just make > > > > > > > > m_priority_queue_rec_len = m_rec_length + PARTITION_BYTES_IN_POS + > > > > table->s->blob_fields * sizeof(Ordered_blob_storage) > > > > > > I don't wnat this data serialized. It is inconvenient to read the dump of > > > the record. Why we should make it harder to code, read the code and debug > > > while we can make it easier? If it is only for guaranteed 1 malloc, I > > > don't > > > think it's worth it. > > > > What do you mean "serialized"? > > A sort of memory organization when data is placed into a single > contiguous buffer. > > > What do you mean "read the dump"? > > x command in GDB. > > > > > Note that you can also use my_multi_malloc, that allocates many buffers > > together and hides address arithmetcs from the caller. Like > > > > my_multi_malloc(MYF(MY_WME), > > &m_ordered_rec_buffer, m_rec_length + PARTITION_BYTES_IN_POS, > > &blob_storage, table->s->blob_fields * sizeof(Ordered_blob_storage), > > NULL) > > > > (but with error checking). And then, perhaps, with placement new[] > > you could construct all Ordered_blob_storage objects in one line. > > Sorry, I don't understand how my version is much worse. > > > > > > > > @@ -6178,7 +6203,11 @@ int > > > > > ha_partition::handle_ordered_index_scan(uchar *buf, bool > > > > > reverse_order) > > > > > */ > > > > > error= file->read_range_first(m_start_key.key? &m_start_key: > > > > > NULL, > > > > > end_range, eq_range, TRUE); > > > > > - memcpy(rec_buf_ptr, table->record[0], m_rec_length); > > > > > + if (!error) > > > > > + { > > > > > + memcpy(rec_buf_ptr, table->record[0], m_rec_length); > > > > > + } > > > > > > > > did you have a bug because of that? something didn't work? > > > > > > No, I just saw worthless memcpy and avoided it. > > > > You're right that memcopy is unnecessary in case of an error. > > But most of the time there is no error. You made an exceptional code path > > faster by making a common code path slower. I think it's generally > > better to do the opposite, making the common, no-error code path > > faster, fewer conditionals, in particular. At the cost of making the error > > path slower, if needed. > > I'm usually aware of such things in my decisions. The slowdown is > close to 0, it is not even 0.0001%, it is much lower. OTOH memcpy() > slowdown is much higher and this is not a fully exceptional code path: > HA_ERR_KEY_NOT_FOUND continues the loop. > > > > > > > > @@ -6310,6 +6349,43 @@ int > > > > > ha_partition::handle_ordered_index_scan_key_not_found() > > > > > + if (cached) > > > > > + { > > > > > + cached->swap(s.blob); > > > > > + s.set_read_value= set_read_value; > > > > > > > > is it indeed possible here for a value to be either in `value` or in > > > > `read_value` ? > > > > > > > > When happens what? > > > > > > Yes, it uses `value` for plain fields and `read_value` for virtual fields. > > > > Oh, I see. > > > > This read_value was introduced in ea1b25046c81db8fdf to solve the case > > of UPDATE, where a row is moved with store_record(table, record[1]) and > > blob pointers aren't updated. This is very similar to what you're > > fixing, perhaps read_value should go away and both bugs should have a > > one unified solution that allows to "park" a record somewhere out of > > record[0] and restore later, all with proper blob handling? > > Do you think this is necessary now? I believe there are a lot of > *much* more important tasks. > > > > > > > > > > > > + } > > > > > + } > > > > > + } > > > > > + table->move_fields(table->field, table->record[0], rec_buf); > > > > > +} > > > > Regards, > > Sergei > > VP of MariaDB Server Engineering > > and [email protected] > > > -- > All the best, > > Aleksey Midenkov > @midenok -- All the best, Aleksey Midenkov @midenok _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

