cshannon commented on issue #3348: URL: https://github.com/apache/accumulo/issues/3348#issuecomment-1537170960
> One thing to keep in mind, that may make this easier to address: SplitUtils is not public API and is not intended for direct consumption. It is used internally to help us approximate relative split sizes when calculating InputSplits. So, it doesn't matter if its method returns a negative number, as long as the places where it's used check its sign to ensure that they handle that situation appropriately. > > We could also try to come up with better approximation methods for split sizes, but I think addressing the negative in the places where it's used is the quickest and easiest way to fix this, without completely rewriting the approximation algorithm. So do you think it's appropriate to just use Long.MAX_VALUE as the range length if a negative is detect in the spots where the method is used? (looks like [BatchInputSplit](https://github.com/apache/accumulo/blob/56d49f15a05db9a46dbceb845918497760601c11/hadoop-mapreduce/src/main/java/org/apache/accumulo/hadoopImpl/mapreduce/BatchInputSplit.java#L113) and RangeInputSplit currently) We could of course just handle a returned negative in both places but it still seems like the simplest thing is to have that method just return Long.MAX_VALUE as the computed range length if it detects an overflow/negative otherwise everytime we call the method we have to handle a potential negative returned which could lead to inconsistencies. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
