[
https://issues.apache.org/jira/browse/HDDS-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835619#comment-17835619
]
Mohammad Arafat Khan commented on HDDS-10614:
---------------------------------------------
The issue you are encountering is related to the *CachingSpaceUsageSource* and
its interaction with the underlying {*}SpaceUsageSource{*}. The exception
{*}java.lang.IllegalArgumentException{*}: _used space cannot be negative_
indicates that somewhere in the process, the calculation of used space has
resulted in a negative number, which is not a valid state.
The root cause seems to be an error during the refresh operation of the
{*}CachingSpaceUsageSource{*}. Specifically, the underlying *SpaceUsageSource*
({_}likely an instance of DU, which uses the Unix du command to calculate disk
usage{_}) is failing due to a permission issue when trying to read the
*/data3/lost+found* directory. This failure might cause the getUsedSpace method
to return an incorrect value ({*}possibly zero{*}), which, when decremented,
results in a negative value. Here are the steps that are likely leading to the
issue:
# The CachingSpaceUsageSource attempts to refresh the used space metric by
calling the refresh method.
# The DU class, which is the actual SpaceUsageSource, tries to execute the du
command to calculate the used space.
# The du command fails with a permission error when trying to access the
/data3/lost+found directory, which is a system directory typically accessible
only by the root user.
# Due to the error, the DU class might return an incorrect value (such as
zero) for the used space.
# The CachingSpaceUsageSource receives this incorrect value and updates its
cached value accordingly. Subsequent calls to decrementUsedSpace then cause the
cached value to go negative, which is not allowed and results in the
{*}IllegalArgumentException{*}.
Fix for this from the recon side should be straightforward, we can Improve
error handling in the CachingSpaceUsageSource to ensure that a failure in the
DU class does not lead to an invalid state and if it does than we have a
fallback value for it.
cc: [~deveshsingh] [~zitadombi]
> Recon fails to start with "used space cannot be negative"
> ---------------------------------------------------------
>
> Key: HDDS-10614
> URL: https://issues.apache.org/jira/browse/HDDS-10614
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Recon
> Affects Versions: 1.4.0
> Reporter: Kirill Sizov
> Assignee: Mohammad Arafat Khan
> Priority: Blocker
>
> Recon fails to start with the following exception. A restart does not help.
> {noformat}
> [qtp2008821270-42945] WARN org.eclipse.jetty.server.HttpChannelState:
> unhandled due to prior sendError
> javax.servlet.ServletException: javax.servlet.ServletException:
> java.lang.IllegalArgumentException: used space cannot be negative.
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:162)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at org.eclipse.jetty.server.Server.handle(Server.java:516)
> at
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
> at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
> at
> org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
> at
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: javax.servlet.ServletException:
> java.lang.IllegalArgumentException: used space cannot be negative.
> at
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410)
> at
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
> at
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
> at
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
> at
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
> at
> com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:290)
> at
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:280)
> at
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:184)
> at
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:89)
> at
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:121)
> at
> com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:133)
> at
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)
> at
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
> at
> org.apache.hadoop.hdds.server.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1648)
> at
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
> at
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
> at
> org.apache.hadoop.hdds.server.http.NoCacheFilter.doFilter(NoCacheFilter.java:48)
> at
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
> at
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
> ... 17 more
> Caused by: java.lang.IllegalArgumentException: used space cannot be negative.
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
> at
> org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat.<init>(SCMNodeStat.java:43)
> at
> org.apache.hadoop.hdds.scm.node.SCMNodeManager.getNodeStatInternal(SCMNodeManager.java:973)
> at
> org.apache.hadoop.hdds.scm.node.SCMNodeManager.getNodeStats(SCMNodeManager.java:882)
> at
> org.apache.hadoop.hdds.scm.node.SCMNodeManager.getStats(SCMNodeManager.java:853)
> at
> org.apache.hadoop.ozone.recon.api.ClusterStateEndpoint.getClusterState(ClusterStateEndpoint.java:120)
> at sun.reflect.GeneratedMethodAccessor228.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
> at
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
> at
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
> at
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)
> at
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
> at
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475)
> at
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397)
> at
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
> at
> org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255)
> at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
> at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
> at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
> at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
> at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
> at
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
> at
> org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234)
> at
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680)
> at
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
> ... 50 more
> {noformat}
> *Possible reason:*
> Datanodes send reports with negative used space.
> On the datanode side {{CachingSpaceUsageSource}} has a method
> {{decrementUsedSpace}} which decrements total used size by a particular
> value. The total used size is in turn taken from a delegate
> {{SpaceUsageSource}}. If the latter fails to return a value,
> {{decrementUsedSpace}} will subtract from 0, hence a negative value.
> A failure to get used size might be caused by smth like:
> {noformat}
> 2024-03-21 13:30:11,613 [DiskUsage-/data3-
> ] WARN org.apache.hadoop.hdds.fs.CachingSpaceUsageSource: Error refreshing
> space usage for /data3
> java.io.UncheckedIOException: ExitCodeException exitCode=1: du: cannot read
> directory '/data3/lost+found': Permission denied
> at org.apache.hadoop.hdds.fs.DU$DUShell.getUsed(DU.java:94)
> at
> org.apache.hadoop.hdds.fs.AbstractSpaceUsageSource.time(AbstractSpaceUsageSource.java:56)
> at org.apache.hadoop.hdds.fs.DU.getUsedSpace(DU.java:63)
> at
> org.apache.hadoop.hdds.fs.CachingSpaceUsageSource.refresh(CachingSpaceUsageSource.java:140)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: ExitCodeException exitCode=1: du: cannot read directory
> '/data3/lost+found': Permission denied
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:1068)
> at org.apache.hadoop.util.Shell.run(Shell.java:957)
> at org.apache.hadoop.hdds.fs.DU$DUShell.getUsed(DU.java:91)
> ... 10 more
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]