hi,
IMO, when upsert 150K record with 100columns, these records need serializate to
disk and deserialize from disk.
You can try add < option("hoodie.memory.merge.max.size", "2004857600000") >
best,
lamber-ken
At 2020-03-10 17:07:58, "selvaraj periyasamy"
<[email protected]> wrote:
Sorry for the partial emails. My company portal don’t allow me to add test code
. Am using 0.5.0 version of Hudi Jars built from my local. While running
upsert , it takes more than 6 or 7 mins for processing 150k records.
Is there any tuning that could reduce the processing time from 6 or 7 mins ?
Overwrite just takes less than a min ? Each row has 100 columns .
Thanks,
Selva
On Tue, Mar 10, 2020 at 1:51 AM selvaraj periyasamy
<[email protected]> wrote:
Team,
Am using 0.5.0 version of Hudi Jars built from my local. While running upsert
, it takes more than 6 or 7 mins for processing 150k records. Below are the
code and logs.
20/03/10 07:26:09 INFO IteratorBasedQueueProducer: starting to buffer records
20/03/10 07:26:09 INFO BoundedInMemoryExecutor: starting consumer thread
20/03/10 07:33:59 INFO IteratorBasedQueueProducer: finished buffering records
20/03/10 07:34:00 INFO BoundedInMemoryExecutor: Queue Consumption is done;
notifying producer threads
20/03/10 07:26:08 INFO IteratorBasedQueueProducer: starting to buffer records
20/03/10 07:26:08 INFO BoundedInMemoryExecutor: starting consumer thread
20/03/10 07:33:31 INFO IteratorBasedQueueProducer: finished buffering records
20/03/10 07:33:31 INFO BoundedInMemoryExecutor: Queue Consumption is done;
notifying producer threads
While running insert
On Tue, Mar 10, 2020 at 1:45 AM selvaraj periyasamy
<[email protected]> wrote:
Team,
Am using 0.5.0 version of Hudi Jars built from my local. While running upsert
20/03/10 07:26:09 INFO IteratorBasedQueueProducer: starting to buffer records
20/03/10 07:26:09 INFO BoundedInMemoryExecutor: starting consumer thread
20/03/10 07:33:59 INFO IteratorBasedQueueProducer: finished buffering records
20/03/10 07:34:00 INFO BoundedInMemoryExecutor: Queue Consumption is done;
notifying producer threads
20/03/10 07:26:08 INFO IteratorBasedQueueProducer: starting to buffer records
20/03/10 07:26:08 INFO BoundedInMemoryExecutor: starting consumer thread
20/03/10 07:33:31 INFO IteratorBasedQueueProducer: finished buffering records
20/03/10 07:33:31 INFO BoundedInMemoryExecutor: Queue Consumption is done;
notifying producer threads