ztzg commented on pull request #1519: URL: https://github.com/apache/zookeeper/pull/1519#issuecomment-725535273
> > I was considering adding a "fourth commit," making sure field values written to audit log entries are systematically escaped, but am not sure which encoding to use. Is there a precedent in the code base? In any case, a subset of URL encoding may be good enough; e.g.: % → %25, \t → %09, \n → %0A, and everything non-ASCII to %-encoded UTF-8 bytes. WDYT? > > I don't have a strong opinion about this. I don't know about any precedent for this... do you think this would be necessary? Do we expect that user names / schema ids would contain any "dangerous" characters? If the logs are processed by some scripts, then maybe escaping \n (or even \r) might be good. On the other hand the log processing tools are usually more robust and can handle multiline logs too (e.g. stacktraces). Also you can configure log4j to produce UTF-8 log files I guess. I am not adding that "fourth commit" for now, and have also "disabled" the third one, which does per-scheme filtering. (I have kept it in the individual commits on this PR in case somebody wants to fish it out, but it will "disappear" once everything is squashed by the committer.) I think we should be careful not to inject unsanitized user data into logs in general. But the above seems overkill because authentication IDs are normally not under user control… except when the `digest` provider is enabled—and we now have a flag to block that vector. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
