[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

GitBox Sun, 21 Jun 2020 23:57:04 -0700


Raghvendradubey commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-647325340



   sure.
   I am trying to achieve near real time data( like Read Optimized View) by 
updating records over S3.
   eg - 
   let's say I have records 
   a1 b1 t1
   a1, b2, t2
   a1, b3, t3
   t1,t2,t3 incremental timestamps
   so finally I want a1, b3, t3 record.
   
   Data Pipeline - 
   Reading data from Kafka through Spark Structured Streaming and performing 
upsert into Hudi table over s3
   
   Data Read from Kafka - 
   Size - 100-300MB/minute
   Kafka Parallel Partitions - 15
   Upsert:Insert - 7:3
   No. Of Columns - 550 
   
   Please let me know if needs more info.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

Reply via email to