Hi Balaji, Thank you, tuning these parameters helped me.
Best Regards Maksim пт, 31 мая 2019 г. в 19:16, [email protected] <[email protected]>: > > Hi Maksim, > It looks like the spark and hudi memory settings ( spark.executor.memory, > spark.memory.fraction, hudi.memory.merge.fraction) may not have been > configured correctly to let Hudi use memory for merging. With the current > settings you have, Hudi has no memory to use for Merge process and is > resorting to disk based merging which will be slow but would progress > without OOM. You would need to check your configs. > The logic for calculating memory is in : > > https://github.com/apache/incubator-hudi/blob/acd74129cd97f24c0dde9bf032a4048f2ce27b5f/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieMemoryConfig.java#L117 > > Balaji.V > > > On Friday, May 31, 2019, 2:53:56 AM PDT, Максим Радионов < > [email protected]> wrote: > > Hi, > > I think part of the update process is work slow and I'm doing something > wrong. Example work MOR at - > https://drive.google.com/open?id=17YP_V5k-g3Rp6-jaWaTWvKUSBTOPHg4g > as it seems to me a long time trying to update the existing records, and I > don't understand why it takes most of the time to work " > BoundedInMemoryQueue" (more than 1.5 hours to overwrite 15 GB and 625 > files) > I use Spark - version 2.3.0.cloudera3 and try apply 100mln records(15GB) > for snapshot 1 billion records (1ТБ ~8k files) > I am so appreciate if anyone can help me locate this problem. > > Logs, executor > > 19/05/29 12:47:09 INFO storage.ShuffleBlockFetcherIterator: Started > 269 remote fetches in 109 ms > > 19/05/29 12:47:09 INFO util.FSUtils: Hadoop Configuration: > fs.defaultFS: [hdfs://nameservice1], Config:[Configuration: ], > FileSystem: > [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1394640145_127, > ugi=mradionov (auth:SIMPLE)]]] > > 19/05/29 12:47:09 INFO io.HoodieMergeHandle: MaxMemoryPerPartitionMerge => > 0 > > 19/05/29 12:47:09 INFO collection.DiskBasedMap: Spilling to file > location /tmp/91e5578f-6e25-476d-8c63-15834c7588f9 in host (1.1.1.1) > with hostname (host) > > 19/05/29 12:47:23 INFO io.HoodieMergeHandle: Number of entries in > MemoryBasedMap => 0Total size in bytes of MemoryBasedMap => 0Number of > entries in DiskBasedMap => 125849Size of file spilled to disk => > 75107012 > > 19/05/29 12:47:25 INFO io.HoodieMergeHandle: Merging new data into > oldPath > /init_1000mln/default/a46c60f1-63bf-4b8d-b24e-6a6bdb36dad9_2593_20190527161512.parquet, > as newPath > /init_1000mln/default/a46c60f1-63bf-4b8d-b24e-6a6bdb36dad9_461_20190529115632.parquet > > 19/05/29 12:47:25 INFO util.FSUtils: Hadoop Configuration: > fs.defaultFS: [hdfs://nameservice1], Config:[Configuration: ], > FileSystem: > [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1394640145_127, > ugi=(auth:SIMPLE)]]] > > 19/05/29 12:47:25 INFO util.FSUtils: Hadoop Configuration: > fs.defaultFS: [hdfs://nameservice1], Config:[Configuration: ], > FileSystem: > [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1394640145_127, > ugi=(auth:SIMPLE)]]] > > 19/05/29 12:47:25 INFO compress.CodecPool: Got brand-new compressor [.gz] > > 19/05/29 12:47:25 INFO util.FSUtils: Hadoop Configuration: > fs.defaultFS: [hdfs://nameservice1], Config:[Configuration: ], > FileSystem: > [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1394640145_127, > ugi=(auth:SIMPLE)]]] > > 19/05/29 12:47:25 INFO util.FSUtils: Hadoop Configuration: > fs.defaultFS: [hdfs://nameservice1], Config:[Configuration: ], > FileSystem: > [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1394640145_127, > ugi=(auth:SIMPLE)]]] > > 19/05/29 12:47:25 INFO queue.BoundedInMemoryExecutor: starting consumer > thread > > 19/05/29 *12:47:25* INFO queue.IteratorBasedQueueProducer: starting to > buffer records > > 19/05/29 *13:28:44* INFO queue.IteratorBasedQueueProducer: finished > buffering records > > 19/05/29 13:28:44 INFO queue.BoundedInMemoryExecutor: Queue > Consumption is done; notifying producer threads > > > > Best Regards > Maksim Radionov
