Eager assignment is certainly the simplest fix. I'm not sure if lazy initialization is based off measurements or whether it was a "premature optimization". In any case, I agree with Antoine that the simple move assignment will not be thread safe, not only for the reasons he describes but also because both the compiler and hardware can reorder stores without explicit synchronization/memory fencing so that the "publication" store of the payload pointer (e.g. the pointer to the backbone array inside the unordered_map) may occur before the stores that put the underlying data structures into a consistent state. Such is especially the case on e.g. ARM with much more relaxed memory models than x86-64.
Antoine, your approach with std::atomic_load looks correct to me: this approach was what I was intending to suggest with "last-writer-wins". I was unaware of std::atomic_load/store for std::shared_ptr in c++11 even though I had searched around for alternatives to std::atomic_shared_ptr (not until c++20 I believe) and wasn't sure there was a portable c++11 solution. I'm unlikely to be able to send a PR until Monday and certainly won't feel put upon if you fix it before then. Thanks for the feedback/input! -----Original Message----- From: Antoine Pitrou <anto...@python.org> Sent: Friday, January 4, 2019 6:00 PM To: dev@arrow.apache.org Subject: (EXT) Re: Thread safety of C++ Struct/Schema::GetFieldIndex However, eager initialization could probably work too. I'm not sure why we used lazy initialization like this. Perhaps someone worried about the cost of incremental schema construction using repeated Schema::AddField() calls (but that's gonna be wasteful anyway)? Le 04/01/2019 à 23:42, Antoine Pitrou a écrit : > > In other words, something like: > > class StructType { > mutable std::shared_ptr<std::unordered_map<std::string, int>> > name_to_index_; > > std::shared_ptr<unordered_map<std::string, int>> GetNameToIndex() const { > if (!std:atomic_load(&name_to_index_)) { > name_to_index = std::make_shared<std::unordered_map<std::string, > int>>(); > // ... initialize name_to_index ... > std::atomic_store(&name_to_index_, name_to_index); > } > return std:atomic_load(&name_to_index_); > } > }; > > > > > Le 04/01/2019 à 23:32, Antoine Pitrou a écrit : >> >> The move-assigning would definitely not be thread-safe. >> >> One possibility would be to wrap the std::unordered_map in a >> std::shared_ptr, and use the atomic functions for shared_ptr: >> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.c >> ppreference.com%2Fw%2Fcpp%2Fmemory%2Fshared_ptr%2Fatomic&data=02% >> 7C01%7C%7Ceaf7a2aa1bda4f6688d408d6729853a5%7C17f8a405aa9b46a29dad1086 >> 2de90a7e%7C0%7C0%7C636822395907105928&sdata=51xhBvgavPcUV3fe1E1Mh >> 4r1QaNDDTc3lhG4nMGrXRY%3D&reserved=0 >> >> Regards >> >> Antoine. >> >> >> Le 04/01/2019 à 23:17, Wes McKinney a écrit : >>> hi Gene, >>> >>> Yes, feel free to submit a PR to fix this. I would suggest >>> populating function-local std::unordered_map and then move-assigning >>> it into name_to_index_ -- I think this should not have race >>> conditions. If you do want to add a mutex, it could be a static one >>> rather than creating a new mutex for each StructType >>> >>> Thanks >>> Wes >>> >>> On Fri, Jan 4, 2019 at 4:10 PM Gene Novark <g...@pdtpartners.com.invalid> >>> wrote: >>>> >>>> These are both effectively-immutable accessors with lazy initialization. >>>> However, when accessed from multiple threads a race can occur initializing >>>> the name_to_index_ map. This seems like a bug rather than a purposeful >>>> design choice based off the cpp/conventions.rst section on Immutability. >>>> I'm happy to send a PR if this is agreed to be a bug and what an >>>> acceptable fix is (e.g. just eagerly initialize, always use a mutex, >>>> last-writer-wins with some sort of atomic ptr, etc.). >>>> >>>> Thanks, >>>> Gene >>>> >>>> >>>> ________________________________ >>>> This communication is intended only for the addressee(s), may contain >>>> confidential, privileged or proprietary information, and may be protected >>>> by US and other laws. Your acceptance of this communication constitutes >>>> your agreement to keep confidential all the confidential information >>>> contained in this communication, as well as any information derived by you >>>> from the confidential information contained in this communication. We do >>>> not waive any confidentiality by misdelivery. >>>> >>>> If you receive this communication in error, any use, dissemination, >>>> printing or copying is strictly prohibited; please destroy all electronic >>>> and paper copies and notify the sender immediately. Nothing in this email >>>> is intended to constitute (1) investment or trading advice or >>>> recommendations or any advertisement or (2) a solicitation of an >>>> investment in any jurisdiction in which such a solicitation would be >>>> unlawful. >>>> >>>> Please note that PDT Partners, LLC, including its affiliates, reserves the >>>> right to intercept, archive, monitor and review all communications to and >>>> from its network. ________________________________ This communication is intended only for the addressee(s), may contain confidential, privileged or proprietary information, and may be protected by US and other laws. Your acceptance of this communication constitutes your agreement to keep confidential all the confidential information contained in this communication, as well as any information derived by you from the confidential information contained in this communication. We do not waive any confidentiality by misdelivery. If you receive this communication in error, any use, dissemination, printing or copying is strictly prohibited; please destroy all electronic and paper copies and notify the sender immediately. Nothing in this email is intended to constitute (1) investment or trading advice or recommendations or any advertisement or (2) a solicitation of an investment in any jurisdiction in which such a solicitation would be unlawful. Please note that PDT Partners, LLC, including its affiliates, reserves the right to intercept, archive, monitor and review all communications to and from its network.