elistevens commented on issue #544: "Reduce output must shrink more rapidly" misrepresents which view is problematic URL: https://github.com/apache/couchdb/issues/544#issuecomment-304426100 Ah, I hadn't expected that implementation. Thanks for the link. I'm not 100% certain what `State.line_length` is, but I'm assuming it's the character length of all of the functions and data combined. If that's true, this heuristic seems like it would trigger on literally every output if you had enough views in the design doc. Please check my logic: Give a design doc 2k views, each with a reduce that is just `return values[0]`. Have all values just be `0`. `reduce_line == "[" + "0," * 2047 + "0]"` which is 4097 characters long. If all of the keys are unique, and we've got group=exact, then `input_length = State.line_length - code_size` should be $HUGE - $HUGE leaving something about 4k in length as the input. So `reduce_length` is 4097, and `input_length` is about the same, and it triggers every time. Now I realize this is a pathological setup (2k views, etc.) but I think that it points to a weakness in the approach. What if the heuristic were implemented this way (I've removed the error checking for brevity): ``` for(var i = 0; i < reduceFuns.length; i++) { reductions[i] = reduceFuns[i](keys, values, rereduce); reduction_len = JSON.stringify(reductions[i]).length values_len = JSON.stringify(values).length if(values.length > 3 && reduction_len > values_len / Math.sqrt(values.length)) { // error out here } } ``` I realize the re approach wastes the output of the `JSON.stringify` calls, but A) I don't know if that's relevant to performance, and B) it could probably be fixed. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
With regards, Apache Git Services
