[GitHub] elistevens commented on issue #544: "Reduce output must shrink more rapidly" misrepresents which view is problematic

git Fri, 26 May 2017 21:14:06 -0700

elistevens commented on issue #544: "Reduce output must shrink more rapidly" 
misrepresents which view is problematic
URL: https://github.com/apache/couchdb/issues/544#issuecomment-304426100
 
 
   Ah, I hadn't expected that implementation. Thanks for the link. I'm not 100% 
certain what `State.line_length` is, but I'm assuming it's the character length 
of all of the functions and data combined.
   
   If that's true, this heuristic seems like it would trigger on literally 
every output if you had enough views in the design doc. Please check my logic:
   
   Give a design doc 2k views, each with a reduce that is just `return 
values[0]`. Have all values just be `0`.
   
   `reduce_line == "[" + "0," * 2047 + "0]"` which is 4097 characters long.
   
   If all of the keys are unique, and we've got group=exact, then `input_length 
=  State.line_length - code_size` should be $HUGE - $HUGE leaving something 
about 4k in length as the input.
   
   So `reduce_length` is 4097, and `input_length` is about the same, and it 
triggers every time.
   
   Now I realize this is a pathological setup (2k views, etc.) but I think that 
it points to a weakness in the approach. What if the heuristic were implemented 
this way (I've removed the error checking for brevity):
   
   ```
   for(var i = 0; i < reduceFuns.length; i++) {
       reductions[i] = reduceFuns[i](keys, values, rereduce);
       reduction_len = JSON.stringify(reductions[i]).length
       values_len = JSON.stringify(values).length
       if(values.length > 3 && reduction_len > values_len / 
Math.sqrt(values.length)) {
           // error out here
       }
   }
   ```
   I realize the re approach wastes the output of the `JSON.stringify` calls, 
but A) I don't know if that's relevant to performance, and B) it could probably 
be fixed.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]



With regards,
Apache Git Services

[GitHub] elistevens commented on issue #544: "Reduce output must shrink more rapidly" misrepresents which view is problematic

Reply via email to