harishchanderramesh commented on issue #1728: URL: https://github.com/apache/hudi/issues/1728#issuecomment-646432947
[spark-2020-06-18-20-47-55.txt](https://github.com/apache/hudi/files/4802488/spark-2020-06-18-20-47-55.txt) [spark-2020-06-18-20-50-10.txt](https://github.com/apache/hudi/files/4802489/spark-2020-06-18-20-50-10.txt) [spark-2020-06-18-20-49-53.txt](https://github.com/apache/hudi/files/4802490/spark-2020-06-18-20-49-53.txt) [spark-2020-06-18-20-49-52.txt](https://github.com/apache/hudi/files/4802491/spark-2020-06-18-20-49-52.txt) [spark-2020-06-18-20-49-49.txt](https://github.com/apache/hudi/files/4802492/spark-2020-06-18-20-49-49.txt) [spark-2020-06-18-20-50-17.txt](https://github.com/apache/hudi/files/4802493/spark-2020-06-18-20-50-17.txt) Have attached driver logs for your review. 1. How do I turn on the hudi metrics? and how to check each commits are writing data? 2. High level use case is, I would want to write kafka streaming data to an apache hudi table is s3 using pyspark streaming. 1. Batch time should be 1 min and processing time should be around 20 seconds. 2. Data in each batch varies time to time from 4000 records to 100k records on peak times. 3. I am not worried about the memory allocation. I need the processing to be as fast as possible. 4. While merging to hudi table, i would need to ignore updates on columns that get null value from source. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
