kou commented on code in PR #38267:
URL: https://github.com/apache/arrow/pull/38267#discussion_r1359641487


##########
cpp/src/arrow/filesystem/s3fs.h:
##########
@@ -378,5 +387,13 @@ Status EnsureS3Finalized();
 ARROW_EXPORT
 Result<std::string> ResolveS3BucketRegion(const std::string& bucket);
 
+/// \brief Try to extract the S3LogLevel from environment variable
+///
+/// Tries to get a valid log level name from the environment variable
+/// ARROW_S3_LOG_LEVEL and returns a matching value of S3LogLevel. If it fails
+/// returns S3LogLevel::Fatal.
+ARROW_EXPORT
+S3LogLevel GetS3LogLevelFromEnvOrDefault();

Review Comment:
   Can we hide this?
   If we want to use this function in test, we can use 
`cpp/src/arrow/filesystem/s3_internal.h`.



##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -3016,5 +3023,35 @@ Result<std::string> ResolveS3BucketRegion(const 
std::string& bucket) {
   return resolver->ResolveRegion(bucket);
 }
 
+S3LogLevel GetS3LogLevelFromEnvOrDefault() {

Review Comment:
   I like `arrow::fs::S3GlobalOptions::Defaults()`.



##########
cpp/src/arrow/filesystem/s3fs.h:
##########
@@ -334,13 +334,22 @@ struct ARROW_EXPORT S3GlobalOptions {
   int num_event_loop_threads = 1;
 };
 
-/// \brief Initialize the S3 APIs.
+/// \brief Initialize the S3 APIs with default options.
 ///
 /// It is required to call this function at least once before using 
S3FileSystem.
 ///
 /// Once this function is called you MUST call FinalizeS3 before the end of the
 /// application in order to avoid a segmentation fault at shutdown.
 ARROW_EXPORT
+Status InitializeS3();
+
+/// \brief Initialize the S3 APIs with the specified set of options.
+///
+/// It is required to call this function at least once before using 
S3FileSystem.
+///
+/// Once this function is called you MUST call FinalizeS3 before the end of the
+/// application in order to avoid a segmentation fault at shutdown.ARROW_EXPORT

Review Comment:
   ```suggestion
   /// application in order to avoid a segmentation fault at shutdown.
   ```



##########
cpp/src/arrow/filesystem/s3fs_test.cc:
##########
@@ -1380,5 +1380,33 @@ class TestS3FSGeneric : public S3TestMixin, public 
GenericFileSystemTest {
 
 GENERIC_FS_TEST_FUNCTIONS(TestS3FSGeneric);
 
+////////////////////////////////////////////////////////////////////////////
+// GetS3LogLevelFromEnvOrDefault tests
+
+class GetS3LogLevelFromEnvOrDefault : public ::testing::Test {};
+
+TEST_F(GetS3LogLevelFromEnvOrDefault, GetS3LogLevelFromEnvOrDefaultBehavior) {
+  // Verify we get the default value of Fatal
+  ASSERT_EQ(S3LogLevel::Fatal, arrow::fs::GetS3LogLevelFromEnvOrDefault());
+
+  // Verify we get the value specified by env var and not the default
+  {
+    EnvVarGuard log_level_guard("ARROW_S3_LOG_LEVEL", "Error");
+    ASSERT_EQ(arrow::fs::GetS3LogLevelFromEnvOrDefault(), S3LogLevel::Error);

Review Comment:
   How about swapping the expected/actual order like the first `ASSERT_EQ()`?
   
   ```suggestion
       ASSERT_EQ(S3LogLevel::Error, arrow::fs::GetS3LogLevelFromEnvOrDefault());
   ```



##########
docs/source/cpp/env_vars.rst:
##########
@@ -85,6 +85,28 @@ that changing their value later will have an effect.
    ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
    Alternatively, one can set :envvar:`HADOOP_HOME`.
 
+.. envvar:: ARROW_S3_LOG_LEVEL

Review Comment:
   I think that it's OK. It seems there is no common environment variable for 
log level used in AWS related tools.



##########
docs/source/cpp/env_vars.rst:
##########
@@ -85,6 +85,28 @@ that changing their value later will have an effect.
    ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
    Alternatively, one can set :envvar:`HADOOP_HOME`.
 
+.. envvar:: ARROW_S3_LOG_LEVEL
+
+   Controls the verbosity of logging produced by S3 calls. Defaults to Fatal

Review Comment:
   ```suggestion
      Controls the verbosity of logging produced by S3 calls. Defaults to 
``Fatal``
   ```



##########
docs/source/python/filesystems.rst:
##########
@@ -207,6 +207,14 @@ Here are a couple examples in code::
 
    :func:`pyarrow.fs.resolve_s3_region` for resolving region from a bucket 
name.
 
+Troubleshooting
+~~~~~~~~~~~~~~~
+
+When using :class:`S3FileSystem`, output is only produced for fatal errors or
+when printing return values. For troubleshooting, the log level can be set 
using
+the environment variable ``ARROW_S3_LOG_LEVEL``. The log level must be set 
prior
+to running any code that interacts with S3. Possible values include 'Fatal' 
(the
+default), 'Error', 'Warn', 'Info', 'Debug' (recommended), 'Trace', and 'Off'.

Review Comment:
   ```suggestion
   to running any code that interacts with S3. Possible values include 
``Fatal`` (the
   default), ``Error``, ``Warn``, ``Info``, ``Debug`` (recommended), ``Trace``, 
and ``Off``.
   ```



##########
docs/source/cpp/env_vars.rst:
##########
@@ -85,6 +85,28 @@ that changing their value later will have an effect.
    ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
    Alternatively, one can set :envvar:`HADOOP_HOME`.
 
+.. envvar:: ARROW_S3_LOG_LEVEL
+
+   Controls the verbosity of logging produced by S3 calls. Defaults to Fatal
+   which only produces output in the case of fatal errors. Debug is recommended
+   when you're trying to troubleshoot issues.
+
+   Possible values include:
+
+   - ``Fatal`` (the default)
+   - ``Error``
+   - ``Warn``
+   - ``Info``
+   - ``Debug``
+   - ``Trace``
+   - ``Off``

Review Comment:
   How about using all lower case (`fatal`) or all upper case (`FATAL`) instead 
of capitalized (`Fatal`)?
   Because other our environment variables use all lower case or all upper case.



##########
docs/source/cpp/env_vars.rst:
##########
@@ -85,6 +85,28 @@ that changing their value later will have an effect.
    ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
    Alternatively, one can set :envvar:`HADOOP_HOME`.
 
+.. envvar:: ARROW_S3_LOG_LEVEL
+
+   Controls the verbosity of logging produced by S3 calls. Defaults to Fatal
+   which only produces output in the case of fatal errors. Debug is recommended

Review Comment:
   ```suggestion
      which only produces output in the case of fatal errors. ``Debug`` is 
recommended
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to