kimmazhenxin commented on issue #1913: URL: https://github.com/apache/hudi/issues/1913#issuecomment-674878876
> **_Tips before filing an issue_** > > * Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? yes > * Join the mailing list to engage in conversations and get faster support at [[email protected]](mailto:[email protected]). > * If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. > > **Describe the problem you faced** > > I have an MOR table with continuous upserts in a loop. I have the maxFileLimit to 2G and smallFileLimit as 1G > I am seeing `java.io.IOException: Too many open files` in the Hudi logs and spark job terminates with FATAL. > > Every upsert contains 3M records new records > This happens after about 200M in the table. > > **To Reproduce** > > Steps to reproduce the behavior: > > ``` > val parallelism = options.getOrElse("parallelism", Math.max(2, upsertCount/100000).toString).toInt > println("parallelism", parallelism) > > var fileSizeOptions = Map[String,String] () > fileSizeOptions += (HoodieStorageConfig.PARQUET_FILE_MAX_BYTES -> String.valueOf(2013265920)) // 2GB fileSizeOptions += (HoodieStorageConfig.PARQUET_BLOCK_SIZE_BYTES -> String.valueOf(2013265920)) // 2GB > fileSizeOptions += (HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT_BYTES -> String.valueOf(1006632960)) // 1GB > > (inputDF > .write > .format("org.apache.hudi") > .option(HoodieWriteConfig.TABLE_NAME, options.getOrElse("tableName", "facestest")) > .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL) > .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partitionKey") > .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "nodeId") > .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "ts") > .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL) > .option("hoodie.upsert.shuffle.parallelism", parallelism)//not the ideal way but works > .option("hoodie.bulkinsert.shuffle.parallelism", parallelism)//not the ideal way but works > .option("hoodie.cleaner.commits.retained", "1") > .options(fileSizeOptions) > .mode(SaveMode.Append) > .save(getHudiPath(spark))) > ``` > > **Expected behavior** > > A clear and concise description of what you expected to happen. > > **Environment Description** > > * Hudi version : 0.5.3 > * Spark version : 2.4.5 > * Storage (HDFS/S3/GCS..) : S3 > * Running on Docker? (yes/no) : no > > **Additional context** > > I notice lots of FileSystemViewHandler activity happening like this > > ``` > 20/08/03 03:26:49 INFO FileSystemViewHandler: TimeTakenMillis[Total=101, Refresh=98, handle=3, Check=0], Success=true, Query=basepath=s3%3A%2F%2Fchelan-dev-mock-faces%2FTestFacesUpserForLoop%2F1000P2G&lastinstantts=20200803032307&timelinehash=9ef96ca1f93d48b24891b749ead94dfeb95ac7123fe538eb09733a1494befe77, Host=ip-10-0-1-42.us-west-2.compute.internal:39767, synced=false > ``` > > Looking at the host in the above logs, it is the driver host where this logs are being executed. Can you explain why driver has to access the FileSystem so much. > > ``` > [hadoop@ip-10-0-1-42 s-1DFOYHZTX6MAG]$ zgrep FileSystemViewHandler stderr.gz |wc -l > 506548 > [hadoop@ip-10-0-1-42 s-1DFOYHZTX6MAG]$ zgrep 'Too many open files' stderr.gz |wc -l > 17054 > ``` > > **Stacktrace** > > ``` > 20/08/03 10:00:44 INFO FileSystemViewHandler: TimeTakenMillis[Total=1, Refresh=0, handle=0, Check=0], Success=false, Query=partition=670&basepath=s3%3A%2F%2Fchelan-dev-mock-faces%2FTestFacesUpserForLoop%2F1000P2G&lastinstantts=20200803094501&timelinehash=96d2a8ca3e0bb58b3 > 82377345ed35eaf99f09f97540b13bf6295682fc42eade0, Host=ip-10-0-1-42.us-west-2.compute.internal:44483, synced=false > 20/08/03 10:00:44 WARN ExceptionMapper: Uncaught exception > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: chelan-dev-mock-faces.s3.us-west-2.amazonaws.com > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1189) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1135) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:784) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:752) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1335) > at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:22) > at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:8) > at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:114) > at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:189) > at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:184) > at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.getObjectMetadata(AmazonS3LiteClient.java:96) > at com.amazon.ws.emr.hadoop.fs.s3.lite.AbstractAmazonS3Lite.getObjectMetadata(AbstractAmazonS3Lite.java:43) > at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.getFileMetadataFromCacheOrS3(Jets3tNativeFileSystemStore.java:497) > at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:223) > at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:590) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440) > at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:357) > at org.apache.hudi.common.fs.HoodieWrapperFileSystem.exists(HoodieWrapperFileSystem.java:460) > at org.apache.hudi.common.fs.FSUtils.createPathIfNotExists(FSUtils.java:517) > at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$5(AbstractTableFileSystemView.java:223) > at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660) > at org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:214) > at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFileGroups(AbstractTableFileSystemView.java:545) > at org.apache.hudi.timeline.service.handlers.FileSliceHandler.getAllFileGroups(FileSliceHandler.java:88) > at org.apache.hudi.timeline.service.FileSystemViewHandler.lambda$registerFileSlicesAPI$17(FileSystemViewHandler.java:281) > at org.apache.hudi.timeline.service.FileSystemViewHandler$ViewHandler.handle(FileSystemViewHandler.java:329) > at io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22) > at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606) > at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46) > at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17) > at io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143) > at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41) > at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107) > at io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72) > at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668) > at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) > at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61) > at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174) > at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.eclipse.jetty.server.Server.handle(Server.java:502) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370) > at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) > at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) > at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) > at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) > at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) > at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) > at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) > at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) > at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.UnknownHostException: chelan-dev-mock-faces.s3.us-west-2.amazonaws.com > at java.net.InetAddress.getAllByName0(InetAddress.java:1281) > at java.net.InetAddress.getAllByName(InetAddress.java:1193) > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38) > at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112) > at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:374) > at sun.reflect.GeneratedMethodAccessor294.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.conn.$Proxy27.connect(Unknown Source) > at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) > at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) > at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) > at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) > at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) > at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1311) > at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1127) > ... 62 more > 20/08/03 10:00:44 WARN AbstractConnector: > java.io.IOException: Too many open files > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) > at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:419) > at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:247) > at org.spark_project.jetty.server.ServerConnector.accept(ServerConnector.java:397) > at org.spark_project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601) > at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > ``` @luffyd Did you solve this problem? I have the same problem. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
