harishchanderramesh commented on issue #1728:
URL: https://github.com/apache/hudi/issues/1728#issuecomment-646432947


   
[spark-2020-06-18-20-47-55.txt](https://github.com/apache/hudi/files/4802488/spark-2020-06-18-20-47-55.txt)
   
[spark-2020-06-18-20-50-10.txt](https://github.com/apache/hudi/files/4802489/spark-2020-06-18-20-50-10.txt)
   
[spark-2020-06-18-20-49-53.txt](https://github.com/apache/hudi/files/4802490/spark-2020-06-18-20-49-53.txt)
   
[spark-2020-06-18-20-49-52.txt](https://github.com/apache/hudi/files/4802491/spark-2020-06-18-20-49-52.txt)
   
[spark-2020-06-18-20-49-49.txt](https://github.com/apache/hudi/files/4802492/spark-2020-06-18-20-49-49.txt)
   
[spark-2020-06-18-20-50-17.txt](https://github.com/apache/hudi/files/4802493/spark-2020-06-18-20-50-17.txt)
   
   
   Have attached driver logs for your review.
   
   1. How do I  turn on the hudi metrics? and how to check each commits are 
writing data?
   2. High level use case is, I would want to write kafka streaming data to an 
apache hudi table is s3 using pyspark streaming.
   1. Batch time should be 1 min and processing time should be around 20 
seconds.
   2. Data in each batch varies time to time from 4000 records to 100k records 
on peak times.
   3. I am not worried about the memory allocation. I need the processing to be 
as fast as possible.
   4. While merging to hudi table, i would need to ignore updates on columns 
that get null value from source.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to