tooptoop4 opened a new issue #1906:
URL: https://github.com/apache/hudi/issues/1906
for one simple spark-submit (COW data source)
org.apache.hudi.common.util.FSUtils is logged on 21 lines, each line being
2700+ bytes
example of one line
2020-08-03 14:19:52,892 [main] INFO org.apache.hudi.common.util.FSUtils -
Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
__spark_hadoop_conf__.xml, file:/home/ec2-user/spark_home/conf/hive-site.xml],
FileSystem: [S3AFileSystem{uri=s3a://mybucketo,
workingDir=s3a://mybucketo/user/ec2-user, inputPolicy=normal,
partSize=104857600, enableMultiObjectsDelete=true, maxKeys=5000,
readAhead=65536, blockSize=33554432, multiPartThreshold=2147483647,
serverSideEncryptionAlgorithm='AES256',
blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@c247b02,
boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=2405,
available=2405, waiting=0}, activeCount=0},
unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@78f1d29[Running, pool
size = 1, active threads = 0, queued tasks = 0, comp
leted tasks = 1], statistics {11286 bytes read, 4304 bytes written, 160 read
ops, 0 large read ops, 19 write ops}, metrics {{Context=S3AFileSystem}
{FileSystemId=8624b3cf-eff7-4713-8405-95fc9bd35d90-mybucketo}
{fsURI=s3a://mybucketo/private/sparkevents} {files_created=6} {files_copied=0}
{files_copied_bytes=0} {files_deleted=1} {fake_directories_deleted=0}
{directories_created=1} {directories_deleted=0} {ignored_errors=5}
{op_copy_from_local_file=0} {op_exists=44} {op_get_file_status=138}
{op_glob_status=0} {op_is_directory=36} {op_is_file=0} {op_list_files=1}
{op_list_located_status=0} {op_list_status=21} {op_mkdirs=0} {op_rename=0}
{object_copy_requests=0} {object_delete_requests=6} {object_list_requests=124}
{object_continue_list_requests=0} {object_metadata_requests=240}
{object_multipart_aborted=0} {object_put_bytes=4304} {object_put_requests=6}
{object_put_requests_completed=6} {stream_write_failures=0}
{stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} {s
tream_write_block_uploads_aborted=0} {stream_write_total_time=0}
{stream_write_total_data=4304} {object_put_requests_active=0}
{object_put_bytes_pending=0} {stream_write_block_uploads_active=0}
{stream_write_block_uploads_pending=5}
{stream_write_block_uploads_data_pending=0} {stream_read_fully_operations=0}
{stream_opened=22} {stream_bytes_skipped_on_seek=0} {stream_closed=22}
{stream_bytes_backwards_on_seek=0} {stream_bytes_read=11286}
{stream_read_operations_incomplete=22} {stream_bytes_discarded_in_abort=0}
{stream_close_operations=22} {stream_read_operations=29} {stream_aborted=0}
{stream_forward_seek_operations=0} {stream_backward_seek_operations=0}
{stream_seek_operations=0} {stream_bytes_read_in_close=0}
{stream_read_exceptions=0} }}]
some ideas:
1. "Hadoop Configuration" block be removed or split to separate log class
2. make it DEBUG level?
3. only print metrics that are non-0 ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]