Hi Balaji,

Thank you, tuning these parameters helped me.

Best Regards
Maksim

пт, 31 мая 2019 г. в 19:16, [email protected] <[email protected]>:

>
> Hi Maksim,
> It looks like the spark and hudi memory settings ( spark.executor.memory,
> spark.memory.fraction, hudi.memory.merge.fraction)  may not have been
> configured correctly to let Hudi use memory for merging. With the current
> settings you have, Hudi has no memory to use for Merge process and is
> resorting to disk based merging which will be slow but would progress
> without OOM. You would need to check your configs.
> The logic for calculating memory is in :
>
> https://github.com/apache/incubator-hudi/blob/acd74129cd97f24c0dde9bf032a4048f2ce27b5f/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieMemoryConfig.java#L117
>
> Balaji.V
>
>
>     On Friday, May 31, 2019, 2:53:56 AM PDT, Максим Радионов <
> [email protected]> wrote:
>
>  Hi,
>
> I think part of the update process is work slow and I'm doing something
> wrong. Example work MOR at -
> https://drive.google.com/open?id=17YP_V5k-g3Rp6-jaWaTWvKUSBTOPHg4g
> as it seems to me a long time trying to update the existing records, and I
> don't understand why it takes most of the time to work  "
> BoundedInMemoryQueue" (more than 1.5 hours to overwrite 15 GB and 625
> files)
> I use Spark - version 2.3.0.cloudera3 and try apply 100mln records(15GB)
> for snapshot 1 billion records  (1ТБ ~8k files)
> I am so appreciate if anyone can help me locate this problem.
>
> Logs, executor
>
> 19/05/29 12:47:09 INFO storage.ShuffleBlockFetcherIterator: Started
> 269 remote fetches in 109 ms
>
> 19/05/29 12:47:09 INFO util.FSUtils: Hadoop Configuration:
> fs.defaultFS: [hdfs://nameservice1], Config:[Configuration: ],
> FileSystem:
> [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1394640145_127,
> ugi=mradionov (auth:SIMPLE)]]]
>
> 19/05/29 12:47:09 INFO io.HoodieMergeHandle: MaxMemoryPerPartitionMerge =>
> 0
>
> 19/05/29 12:47:09 INFO collection.DiskBasedMap: Spilling to file
> location /tmp/91e5578f-6e25-476d-8c63-15834c7588f9 in host (1.1.1.1)
> with hostname (host)
>
> 19/05/29 12:47:23 INFO io.HoodieMergeHandle: Number of entries in
> MemoryBasedMap => 0Total size in bytes of MemoryBasedMap => 0Number of
> entries in DiskBasedMap => 125849Size of file spilled to disk =>
> 75107012
>
> 19/05/29 12:47:25 INFO io.HoodieMergeHandle: Merging new data into
> oldPath
> /init_1000mln/default/a46c60f1-63bf-4b8d-b24e-6a6bdb36dad9_2593_20190527161512.parquet,
> as newPath
> /init_1000mln/default/a46c60f1-63bf-4b8d-b24e-6a6bdb36dad9_461_20190529115632.parquet
>
> 19/05/29 12:47:25 INFO util.FSUtils: Hadoop Configuration:
> fs.defaultFS: [hdfs://nameservice1], Config:[Configuration: ],
> FileSystem:
> [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1394640145_127,
> ugi=(auth:SIMPLE)]]]
>
> 19/05/29 12:47:25 INFO util.FSUtils: Hadoop Configuration:
> fs.defaultFS: [hdfs://nameservice1], Config:[Configuration: ],
> FileSystem:
> [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1394640145_127,
> ugi=(auth:SIMPLE)]]]
>
> 19/05/29 12:47:25 INFO compress.CodecPool: Got brand-new compressor [.gz]
>
> 19/05/29 12:47:25 INFO util.FSUtils: Hadoop Configuration:
> fs.defaultFS: [hdfs://nameservice1], Config:[Configuration: ],
> FileSystem:
> [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1394640145_127,
> ugi=(auth:SIMPLE)]]]
>
> 19/05/29 12:47:25 INFO util.FSUtils: Hadoop Configuration:
> fs.defaultFS: [hdfs://nameservice1], Config:[Configuration: ],
> FileSystem:
> [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1394640145_127,
> ugi=(auth:SIMPLE)]]]
>
> 19/05/29 12:47:25 INFO queue.BoundedInMemoryExecutor: starting consumer
> thread
>
> 19/05/29 *12:47:25* INFO queue.IteratorBasedQueueProducer: starting to
> buffer records
>
> 19/05/29 *13:28:44* INFO queue.IteratorBasedQueueProducer: finished
> buffering records
>
> 19/05/29 13:28:44 INFO queue.BoundedInMemoryExecutor: Queue
> Consumption is done; notifying producer threads
>
>
>
> Best Regards
> Maksim Radionov

Reply via email to