Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-647325340
sure. I am trying to achieve near real time data( like Read Optimized View) by updating records over S3. eg - let's say I have records a1 b1 t1 a1, b2, t2 a1, b3, t3 t1,t2,t3 incremental timestamps so finally I want a1, b3, t3 record. Data Pipeline - Reading data from Kafka through Spark Structured Streaming and performing upsert into Hudi table over s3 Data Read from Kafka - Size - 100-300MB/minute Kafka Parallel Partitions - 15 Upsert:Insert - 7:3 No. Of Columns - 550 Please let me know if needs more info. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
