paleolimbot commented on code in PR #3876:
URL: https://github.com/apache/arrow-adbc/pull/3876#discussion_r2774963821


##########
c/driver_manager/adbc_driver_manager.cc:
##########
@@ -398,6 +416,81 @@ AdbcStatusCode LoadDriverFromRegistry(HKEY root, const 
std::wstring& driver_name
 }
 #endif  // _WIN32
 
+#define CHECK_STATUS(EXPR)                                \
+  if (auto _status = (EXPR); _status != ADBC_STATUS_OK) { \
+    return _status;                                       \
+  }
+
+AdbcStatusCode ProcessProfileValue(std::string_view value, std::string& out,
+                                   struct AdbcError* error) {
+  if (value.empty()) {
+    SetError(error, "Profile value is null");
+    return ADBC_STATUS_INVALID_ARGUMENT;
+  }
+
+  size_t pos = 0;
+  size_t prev_pos = 0;
+  out.resize(0);
+  while ((pos = value.find("env_var(", prev_pos)) != std::string_view::npos) {
+    const auto closing_paren = value.find_first_of(')', pos);
+    if (closing_paren == std::string_view::npos) {
+      SetError(error, "Malformed env_var() profile value: missing closing 
parenthesis");
+      return ADBC_STATUS_INVALID_ARGUMENT;
+    }
+
+    const auto env_var_start = pos + 8;
+    const auto env_var_len = closing_paren - env_var_start;
+    if (env_var_len == 0) {
+      SetError(error,
+               "Malformed env_var() profile value: missing environment 
variable name");
+      return ADBC_STATUS_INVALID_ARGUMENT;
+    }
+
+    out.append(value.substr(prev_pos, pos - prev_pos));
+    prev_pos = closing_paren + 1;
+
+    // Extract the environment variable name from the value
+    // which should be formatted as env_var(VAR_NAME) as we confirmed
+    // above.
+    const auto env_var_name = value.substr(env_var_start, env_var_len);
+#ifdef _WIN32
+    auto local_env_var = Utf8Decode(std::string(env_var_name));
+    DWORD required_size = GetEnvironmentVariableW(local_env_var.c_str(), NULL, 
0);
+    if (required_size == 0) {
+      out = "";
+      return ADBC_STATUS_OK;
+    }

Review Comment:
   I see why this is important (to avoid putting secrets in .toml files), but 
there is now quite a bit of environment variable state that is required to do 
certain things (like prepend the driver search path with a directory of 
preferred drivers if I remember the order correctly), which means the chance of 
an application concurrently setting and the driver manager fetching an 
environment variable is non-trivial in the presence of something like a 
connection pool. Not in this PR, but it may be worth documenting how to create 
connections in a thread-safe way and/or ensuring there is a thread safe 
alternative to accomplish some of these things.



##########
c/include/arrow-adbc/adbc_driver_manager.h:
##########
@@ -175,6 +175,171 @@ AdbcStatusCode 
AdbcDriverManagerDatabaseSetAdditionalSearchPathList(
 ADBC_EXPORT
 const char* AdbcStatusCodeMessage(AdbcStatusCode code);
 
+/// \defgroup adbc-driver-manager-connection-profile Connection Profiles
+/// Similar to odbc.ini, the ADBC driver manager can support "connection 
profiles"
+/// that specify a driver and options to use when connecting. This allows 
users to
+/// specify connection information in a file or environment variable, and have 
the
+/// driver manager load the appropriate driver and set options accordingly.
+///
+/// This allows creating reusable connection configurations for sharing and 
distribution
+/// without needing to hardcode driver names and options in application code. 
Profiles
+/// will be loaded during DatabaseInit before attempting to initialize the 
driver. Any
+/// options specified by the profile will be applied but will not override 
options
+/// that have already been set using DatabaseSetOption.
+///
+/// To facilitate customization, we define an interface for implementing a 
Connection
+/// Profile object along with a provider function definition which can be set 
into
+/// the driver manager to allow for customized profile loading.
+///
+/// A profile can be specified to the Driver Manager in one of two ways,
+/// which will invoke the profile provider during the call to DatabaseInit:
+///
+/// 1. The "profile" option can be set using DatabaseSetOption with the name 
of the
+/// profile to load.
+/// 2. The "uri" being used can have the form "profile://<profile>"
+///
+/// @{
+
+/// \brief Abstract interface for connection profile providers
+struct ADBC_EXPORT AdbcConnectionProfile {
+  /// \brief Opaque implementation-defined state.
+  /// This field is NULL if the profile is uninitialized/freed (but
+  /// it need not have a value even if the profile is initialized).
+  void* private_data;
+
+  /// \brief Release the profile and perform any cleanup.
+  void (*release)(struct AdbcConnectionProfile* profile);
+
+  /// \brief Get the driver to use as specified by this profile.
+  ///
+  /// It is not required that a profile specify a driver. If the options
+  // can be reusable across drivers, then the profile does not need to specify
+  /// a driver (if this provides an empty string or nullptr then the driver
+  /// must be defined by other means, e.g. by the driver / uri options).
+  ///
+  /// \param[in] profile The profile to query.
+  /// \param[out] driver_name The name of the driver to use, or NULL if not 
specified.
+  /// \param[out] error An optional location to return an error message
+  AdbcStatusCode (*GetDriverName)(struct AdbcConnectionProfile* profile,
+                                  const char** driver_name, struct AdbcError* 
error);

Review Comment:
   I think you do want this while you're here and I think it is fairly easy to 
do. Providing an init function is a more predictable/secure way to precisely 
specify which driver will actually get loaded. There are now quite a lot of 
ways that a driver can be resolved based on name that change based on working 
directory/environment and if I were embedding ADBC in an application I would 
personally want the ability to be absolutely sure about what driver/version was 
loaded. (Alternatively, as an application I could invent my own way to manage 
Profiles and opt out of the driver manager).



##########
c/driver_manager/adbc_driver_manager.cc:
##########
@@ -1042,6 +1137,261 @@ struct ManagedLibrary {
 #endif  // defined(_WIN32)
 };
 
+struct FilesystemProfile {
+  std::filesystem::path path;
+  std::string driver;
+  std::unordered_map<std::string, std::string> options;
+  std::unordered_map<std::string, int64_t> int_options;
+  std::unordered_map<std::string, double> double_options;
+
+  std::vector<const char*> options_keys;
+  std::vector<const char*> options_values;
+
+  std::vector<const char*> int_option_keys;
+  std::vector<int64_t> int_option_values;
+
+  std::vector<const char*> double_option_keys;
+  std::vector<double> double_option_values;
+
+  void PopulateConnectionProfile(struct AdbcConnectionProfile* out) {
+    options_keys.reserve(options.size());
+    options_values.reserve(options.size());
+    for (const auto& [key, value] : options) {
+      options_keys.push_back(key.c_str());
+      options_values.push_back(value.c_str());
+    }
+
+    int_option_keys.reserve(int_options.size());
+    int_option_values.reserve(int_options.size());
+    for (const auto& [key, value] : int_options) {
+      int_option_keys.push_back(key.c_str());
+      int_option_values.push_back(value);
+    }
+
+    double_option_keys.reserve(double_options.size());
+    double_option_values.reserve(double_options.size());
+    for (const auto& [key, value] : double_options) {
+      double_option_keys.push_back(key.c_str());
+      double_option_values.push_back(value);
+    }
+
+    out->private_data = new FilesystemProfile(std::move(*this));
+    out->release = [](AdbcConnectionProfile* profile) {
+      if (!profile || !profile->private_data) {
+        return;
+      }
+
+      delete static_cast<FilesystemProfile*>(profile->private_data);
+      profile->private_data = nullptr;
+      profile->release = nullptr;
+    };
+
+    out->GetDriverName = [](AdbcConnectionProfile* profile, const char** out,
+                            struct AdbcError* error) -> AdbcStatusCode {
+      if (!profile || !profile->private_data) {
+        SetError(error, "Invalid connection profile");
+        return ADBC_STATUS_INVALID_ARGUMENT;
+      }
+
+      auto* fs_profile = 
static_cast<FilesystemProfile*>(profile->private_data);
+      *out = fs_profile->driver.c_str();
+      return ADBC_STATUS_OK;
+    };
+
+    out->GetOptions = [](AdbcConnectionProfile* profile, const char*** keys,
+                         const char*** values, size_t* num_options,
+                         struct AdbcError* error) -> AdbcStatusCode {
+      if (!profile || !profile->private_data) {
+        SetError(error, "Invalid connection profile");
+        return ADBC_STATUS_INVALID_ARGUMENT;
+      }
+
+      if (!keys || !values || !num_options) {
+        SetError(error, "Output parameters cannot be null");
+        return ADBC_STATUS_INVALID_ARGUMENT;
+      }
+
+      auto* fs_profile = 
static_cast<FilesystemProfile*>(profile->private_data);
+      *num_options = fs_profile->options.size();
+      *keys = fs_profile->options_keys.data();
+      *values = fs_profile->options_values.data();
+      return ADBC_STATUS_OK;
+    };
+
+    out->GetIntOptions = [](AdbcConnectionProfile* profile, const char*** keys,
+                            const int64_t** values, size_t* num_options,
+                            struct AdbcError* error) -> AdbcStatusCode {
+      if (!profile || !profile->private_data) {
+        SetError(error, "Invalid connection profile");
+        return ADBC_STATUS_INVALID_ARGUMENT;
+      }
+
+      if (!keys || !values || !num_options) {
+        SetError(error, "Output parameters cannot be null");
+        return ADBC_STATUS_INVALID_ARGUMENT;
+      }
+
+      auto* fs_profile = 
static_cast<FilesystemProfile*>(profile->private_data);
+      *num_options = fs_profile->int_options.size();
+      *keys = fs_profile->int_option_keys.data();
+      *values = fs_profile->int_option_values.data();
+      return ADBC_STATUS_OK;
+    };
+
+    out->GetDoubleOptions = [](AdbcConnectionProfile* profile, const char*** 
keys,
+                               const double** values, size_t* num_options,
+                               struct AdbcError* error) -> AdbcStatusCode {
+      if (!profile || !profile->private_data) {
+        SetError(error, "Invalid connection profile");
+        return ADBC_STATUS_INVALID_ARGUMENT;
+      }
+
+      if (!keys || !values || !num_options) {
+        SetError(error, "Output parameters cannot be null");
+        return ADBC_STATUS_INVALID_ARGUMENT;
+      }
+
+      auto* fs_profile = 
static_cast<FilesystemProfile*>(profile->private_data);
+      *num_options = fs_profile->double_options.size();
+      *keys = fs_profile->double_option_keys.data();
+      *values = fs_profile->double_option_values.data();
+      return ADBC_STATUS_OK;
+    };
+  }
+};
+
+struct ProfileVisitor {
+  FilesystemProfile& profile;
+  const std::filesystem::path& profile_path;
+  struct AdbcError* error;
+
+  bool VisitTable(const std::string& prefix, toml::table& table) {
+    for (const auto& [key, value] : table) {
+      if (auto* str = value.as_string()) {
+        profile.options[prefix + key.data()] = str->get();
+      } else if (auto* int_val = value.as_integer()) {
+        profile.int_options[prefix + key.data()] = int_val->get();
+      } else if (auto* double_val = value.as_floating_point()) {
+        profile.double_options[prefix + key.data()] = double_val->get();
+      } else if (auto* bool_val = value.as_boolean()) {
+        profile.options[prefix + key.data()] = bool_val->get() ? "true" : 
"false";
+      } else if (value.is_table()) {
+        if (!VisitTable(prefix + key.data() + ".", *value.as_table())) {
+          return false;
+        }
+      } else {
+        std::string message = "Unsupported value type for key '" +
+                              std::string(key.str()) + "' in profile '" +
+                              profile_path.string() + "'";
+        SetError(error, std::move(message));
+        return false;
+      }
+    }
+    return !error->message;
+  }
+};
+
+AdbcStatusCode LoadProfileFile(const std::filesystem::path& profile_path,
+                               FilesystemProfile& profile, struct AdbcError* 
error) {
+  toml::table config;
+  try {
+    config = toml::parse_file(profile_path.native());
+  } catch (const toml::parse_error& err) {
+    std::string message = "Could not open profile. ";
+    message += err.what();
+    message += ". Profile: ";
+    message += profile_path.string();
+    SetError(error, std::move(message));
+    return ADBC_STATUS_INVALID_ARGUMENT;
+  }
+
+  profile.path = profile_path;
+  if (!config["version"].is_integer()) {
+    std::string message =
+        "Profile version is not an integer in profile '" + 
profile_path.string() + "'";
+    SetError(error, std::move(message));
+    return ADBC_STATUS_INVALID_ARGUMENT;
+  }
+
+  const auto version = config["version"].value_or(int64_t(1));
+  switch (version) {
+    case 1:
+      break;
+    default: {
+      std::string message =
+          "Profile version '" + std::to_string(version) +
+          "' is not supported by this driver manager. Profile: " + 
profile_path.string();
+      SetError(error, std::move(message));
+      return ADBC_STATUS_INVALID_ARGUMENT;
+    }
+  }
+
+  profile.driver = config["driver"].value_or(""s);
+
+  auto options = config.at_path("options");
+  if (!options.is_table()) {
+    std::string message =
+        "Profile options is not a table in profile '" + profile_path.string() 
+ "'";
+    SetError(error, std::move(message));
+    return ADBC_STATUS_INVALID_ARGUMENT;
+  }
+
+  auto* options_table = options.as_table();
+  ProfileVisitor v{profile, profile_path, error};
+  if (!v.VisitTable("", *options_table)) {
+    return ADBC_STATUS_INVALID_ARGUMENT;
+  }
+
+  return ADBC_STATUS_OK;
+}
+
+SearchPaths GetProfileSearchPaths(const char* additional_search_path_list) {
+  SearchPaths search_paths;
+  {
+    std::vector<std::filesystem::path> additional_paths;
+    if (additional_search_path_list) {
+      additional_paths = InternalAdbcParsePath(additional_search_path_list);
+    }
+
+    for (const auto& path : additional_paths) {
+      search_paths.emplace_back(SearchPathSource::kAdditional, path);
+    }
+  }
+
+  {
+    auto env_paths = GetEnvPaths(kAdbcProfilePath);
+    search_paths.insert(search_paths.end(), env_paths.begin(), 
env_paths.end());
+  }
+
+#if ADBC_CONDA_BUILD
+#ifdef _WIN32
+  const wchar_t* conda_name = L"CONDA_PREFIX";
+#else
+  const char* conda_name = "CONDA_PREFIX";
+#endif  // _WIN32
+
+  auto venv = GetEnvPaths(conda_name);
+  for (const auto& [_, venv_path] : venv) {
+    search_paths.emplace_back(SearchPathSource::kConda,
+                              venv_path / "etc" / "adbc" / "profiles");
+  }
+#else
+  search_paths.emplace_back(SearchPathSource::kDisabledAtCompileTime, "Conda 
prefix");
+#endif  // ADBC_CONDA_BUILD
+
+#ifdef _WIN32
+  const wchar_t* profiles_dir = L"Profiles";
+#elif defined(__APPLE__)

Review Comment:
   I forget whether the Windows configuration directory from 
`InternalAdbcUserConfigDir()` uses the "Roaming" or local version but it might 
be worth checking to make sure the Profiles are stored in the right one (i.e., 
should Profiles follow a user around when logging in to the same account on 
multiple computers, or should they be local to one computer?).



##########
c/include/arrow-adbc/adbc_driver_manager.h:
##########
@@ -175,6 +175,171 @@ AdbcStatusCode 
AdbcDriverManagerDatabaseSetAdditionalSearchPathList(
 ADBC_EXPORT
 const char* AdbcStatusCodeMessage(AdbcStatusCode code);
 
+/// \defgroup adbc-driver-manager-connection-profile Connection Profiles
+/// Similar to odbc.ini, the ADBC driver manager can support "connection 
profiles"
+/// that specify a driver and options to use when connecting. This allows 
users to
+/// specify connection information in a file or environment variable, and have 
the
+/// driver manager load the appropriate driver and set options accordingly.
+///
+/// This allows creating reusable connection configurations for sharing and 
distribution
+/// without needing to hardcode driver names and options in application code. 
Profiles
+/// will be loaded during DatabaseInit before attempting to initialize the 
driver. Any
+/// options specified by the profile will be applied but will not override 
options
+/// that have already been set using DatabaseSetOption.
+///
+/// To facilitate customization, we define an interface for implementing a 
Connection
+/// Profile object along with a provider function definition which can be set 
into
+/// the driver manager to allow for customized profile loading.
+///
+/// A profile can be specified to the Driver Manager in one of two ways,
+/// which will invoke the profile provider during the call to DatabaseInit:
+///
+/// 1. The "profile" option can be set using DatabaseSetOption with the name 
of the
+/// profile to load.
+/// 2. The "uri" being used can have the form "profile://<profile>"
+///
+/// @{
+
+/// \brief Abstract interface for connection profile providers
+struct ADBC_EXPORT AdbcConnectionProfile {
+  /// \brief Opaque implementation-defined state.
+  /// This field is NULL if the profile is uninitialized/freed (but
+  /// it need not have a value even if the profile is initialized).
+  void* private_data;
+
+  /// \brief Release the profile and perform any cleanup.
+  void (*release)(struct AdbcConnectionProfile* profile);
+
+  /// \brief Get the driver to use as specified by this profile.
+  ///
+  /// It is not required that a profile specify a driver. If the options
+  // can be reusable across drivers, then the profile does not need to specify
+  /// a driver (if this provides an empty string or nullptr then the driver
+  /// must be defined by other means, e.g. by the driver / uri options).
+  ///
+  /// \param[in] profile The profile to query.
+  /// \param[out] driver_name The name of the driver to use, or NULL if not 
specified.
+  /// \param[out] error An optional location to return an error message
+  AdbcStatusCode (*GetDriverName)(struct AdbcConnectionProfile* profile,
+                                  const char** driver_name, struct AdbcError* 
error);
+
+  /// \brief Get the string options specified by the profile
+  ///
+  /// The keys and values returned by this function are owned by the profile
+  /// object itself and do not need to be freed or managed by the caller.
+  /// They must not be accessed after calling release on the profile.
+  ///
+  /// The profile can also indicate that a value should be pulled from the 
environment
+  /// by having a value in the form `env_var(ENV_VAR_NAME)`. If the driver
+  /// manager encounters a value of this form, it will replace it with the 
actual value
+  /// of the environment variable `ENV_VAR_NAME` before setting the option. 
This
+  /// is only valid for option *values* not *keys*.
+  ///
+  /// \param[in] profile The profile to query.
+  /// \param[out] keys The keys of the options specified by the profile.
+  /// \param[out] values The values of the options specified by the profile.
+  /// \param[out] num_options The number of options specified by the profile,
+  ///   consumers must not access keys or values beyond this count.
+  /// \param[out] error An optional location to return an error message
+  AdbcStatusCode (*GetOptions)(struct AdbcConnectionProfile* profile, const 
char*** keys,
+                               const char*** values, size_t* num_options,
+                               struct AdbcError* error);
+
+  /// \brief Get the integer options specified by the profile
+  ///
+  /// The keys and values returned by this function are owned by the profile
+  /// object itself and do not need to be freed or managed by the caller. They 
must not be
+  /// accessed after calling release on the profile.
+  ///
+  /// Values returned by this function will be set using the 
DatabaseSetOptionInt function
+  /// on the database object being initialized. If the driver does not support 
the
+  /// DatabaseSetOptionInt function, then options should only be returned as 
strings.
+  ///
+  /// \param[in] profile The profile to query.
+  /// \param[out] keys The keys of the options specified by the profile.
+  /// \param[out] values The values of the options specified by the profile.
+  /// \param[out] num_options The number of options specified by the profile,
+  ///   consumers must not access keys or values beyond this count.
+  /// \param[out] error An optional location to return an error message
+  AdbcStatusCode (*GetIntOptions)(struct AdbcConnectionProfile* profile,
+                                  const char*** keys, const int64_t** values,
+                                  size_t* num_options, struct AdbcError* 
error);
+
+  /// \brief Get the double options specified by the profile
+  ///
+  /// The keys and values returned by this function are owned by the profile
+  /// object itself and do not need to be freed or managed by the caller. They 
must not be
+  /// accessed after calling release on the profile.
+  ///
+  /// Values returned by this function will be set using the 
DatabaseSetOptionDouble
+  /// function on the database object being initialized. If the driver does 
not support
+  /// the DatabaseSetOptionDouble function, then options should only be 
returned as
+  /// strings.
+  ///
+  /// \param[in] profile The profile to query.
+  /// \param[out] keys The keys of the options specified by the profile.
+  /// \param[out] values The values of the options specified by the profile.
+  /// \param[out] num_options The number of options specified by the profile,
+  ///   consumers must not access keys or values beyond this count.
+  /// \param[out] error An optional location to return an error message
+  AdbcStatusCode (*GetDoubleOptions)(struct AdbcConnectionProfile* profile,
+                                     const char*** keys, const double** values,
+                                     size_t* num_options, struct AdbcError* 
error);
+};
+
+/// \brief Common definition for a connection profile provider
+///
+/// \param[in] profile_name The name of the profile to load. This is the value 
of the
+///   "profile" option or the profile specified in the URI.
+/// \param[in] additional_search_path_list A list of additional paths to 
search for
+///   profiles, delimited by the OS specific path list separator.
+/// \param[out] out The profile to return. The caller will take ownership of 
the profile
+///   and is responsible for calling release on it when finished.
+/// \param[out] error An optional location to return an error message if 
necessary.
+typedef AdbcStatusCode (*AdbcConnectionProfileProvider)(
+    const char* profile_name, const char* additional_search_path_list,
+    struct AdbcConnectionProfile* out, struct AdbcError* error);
+
+/// \brief Set a custom connection profile provider for the driver manager.
+///
+/// If no provider is set, the driver manager will use a default, 
filesystem-based
+/// provider which will look for profiles in the following locations if not 
given an
+/// absolute path to a file:
+///
+/// 1. The environment variable ADBC_PROFILE_PATH, which is a list of paths to 
search for
+/// profiles.
+/// 2. The user-level configuration directory (e.g. ~/.config/adbc/profiles on 
Linux).
+///
+/// The filesystem-based profile looks for a file named <profile_name>.toml if 
there is
+/// no extension provided, attempting to parse the toml file for the profile 
information.
+/// If the file is found and parsed successfully, the options specified in the 
profile
+/// which have not already been set will be set as if by DatabaseSetOption 
just before
+/// initialization as part of DatabaseInit.
+///
+/// For file-based profiles the expected format is as follows:
+/// ```toml
+/// version = 1
+/// driver = "driver_name"
+///
+/// [options]
+/// option1 = "value1"
+/// option2 = 42
+/// option3 = 3.14

Review Comment:
   I think my point was just that you have some future flexibility if you nest 
the configuration and if you put `adbc` and `profile` in there somewhere 
there's a better chance somebody who runs across this file without context will 
know what it is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to