harishchanderramesh edited a comment on issue #1728:
URL: https://github.com/apache/hudi/issues/1728#issuecomment-644680815


   @bvaradar - Would like to know more if you are working on similar lines.
   
   @vinothchandar  - I was able to fix that by running on EMR 5.30 and hudi 
0.5.2. there is no error that i see now.
   The streaming batch completes and everything is fine, but the processing 
time increases slowly.
   
   Like you said, I changed the `.option("hoodie.index.type","GLOBAL_BLOOM") \` 
to .option("hoodie.index.type","BLOOM") \
   
   But there was no improvement.
   
   Looking at the stages for each batch, the `count at 
HoodieSparkSqlWriter.scala:256` stage is the one causing the issue. the time 
taken for this stage has increased from 15 seconds to 36 seconds in a matter of 
3 hours.
   
   Firstly i would like to understand what is happening at this stage and why 
this is so costly? The second longest processing stage is `toRdd at 
AvroConversionUtils.scala:43` and running at 3 seconds max.
   <img width="1161" alt="Screenshot 2020-06-16 at 3 52 25 PM" 
src="https://user-images.githubusercontent.com/46951911/84763639-29664e00-afea-11ea-8262-601e059a3b3d.png";>
   <img width="1680" alt="Screenshot 2020-06-16 at 3 56 33 PM" 
src="https://user-images.githubusercontent.com/46951911/84763656-32571f80-afea-11ea-9e12-551afb646611.png";>
   <img width="1618" alt="Screenshot 2020-06-16 at 3 55 14 PM" 
src="https://user-images.githubusercontent.com/46951911/84763665-34b97980-afea-11ea-947e-65222b7a8a43.png";>
   <img width="1680" alt="Screenshot 2020-06-16 at 3 52 34 PM" 
src="https://user-images.githubusercontent.com/46951911/84763669-371bd380-afea-11ea-89c0-bba0fce96913.png";>
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to