tooptoop4 opened a new issue #1906:
URL: https://github.com/apache/hudi/issues/1906


   for one simple spark-submit (COW data source) 
org.apache.hudi.common.util.FSUtils is logged on 21 lines, each line being 
2700+ bytes
   
   example of one line
   
   2020-08-03 14:19:52,892 [main] INFO  org.apache.hudi.common.util.FSUtils - 
Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: 
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, 
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, 
__spark_hadoop_conf__.xml, file:/home/ec2-user/spark_home/conf/hive-site.xml], 
FileSystem: [S3AFileSystem{uri=s3a://mybucketo, 
workingDir=s3a://mybucketo/user/ec2-user, inputPolicy=normal, 
partSize=104857600, enableMultiObjectsDelete=true, maxKeys=5000, 
readAhead=65536, blockSize=33554432, multiPartThreshold=2147483647, 
serverSideEncryptionAlgorithm='AES256', 
blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@c247b02, 
boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=2405,
 available=2405, waiting=0}, activeCount=0}, 
unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@78f1d29[Running, pool 
size = 1, active threads = 0, queued tasks = 0, comp
 leted tasks = 1], statistics {11286 bytes read, 4304 bytes written, 160 read 
ops, 0 large read ops, 19 write ops}, metrics {{Context=S3AFileSystem} 
{FileSystemId=8624b3cf-eff7-4713-8405-95fc9bd35d90-mybucketo} 
{fsURI=s3a://mybucketo/private/sparkevents} {files_created=6} {files_copied=0} 
{files_copied_bytes=0} {files_deleted=1} {fake_directories_deleted=0} 
{directories_created=1} {directories_deleted=0} {ignored_errors=5} 
{op_copy_from_local_file=0} {op_exists=44} {op_get_file_status=138} 
{op_glob_status=0} {op_is_directory=36} {op_is_file=0} {op_list_files=1} 
{op_list_located_status=0} {op_list_status=21} {op_mkdirs=0} {op_rename=0} 
{object_copy_requests=0} {object_delete_requests=6} {object_list_requests=124} 
{object_continue_list_requests=0} {object_metadata_requests=240} 
{object_multipart_aborted=0} {object_put_bytes=4304} {object_put_requests=6} 
{object_put_requests_completed=6} {stream_write_failures=0} 
{stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} {s
 tream_write_block_uploads_aborted=0} {stream_write_total_time=0} 
{stream_write_total_data=4304} {object_put_requests_active=0} 
{object_put_bytes_pending=0} {stream_write_block_uploads_active=0} 
{stream_write_block_uploads_pending=5} 
{stream_write_block_uploads_data_pending=0} {stream_read_fully_operations=0} 
{stream_opened=22} {stream_bytes_skipped_on_seek=0} {stream_closed=22} 
{stream_bytes_backwards_on_seek=0} {stream_bytes_read=11286} 
{stream_read_operations_incomplete=22} {stream_bytes_discarded_in_abort=0} 
{stream_close_operations=22} {stream_read_operations=29} {stream_aborted=0} 
{stream_forward_seek_operations=0} {stream_backward_seek_operations=0} 
{stream_seek_operations=0} {stream_bytes_read_in_close=0} 
{stream_read_exceptions=0} }}]
   
   some ideas:
   1. "Hadoop Configuration" block be removed or split to separate log class
   2.  make it DEBUG level?
   3.  only print metrics that are non-0 ?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to