uranusjr commented on PR #34621: URL: https://github.com/apache/airflow/pull/34621#issuecomment-1736797913
There are two directions to the problem: a. whether we want a Windows user’s _checkout_ to use CRLF, or just LF. b. whether we want a file created by a Windows user with CRLF to be _checked in_ to the repository as-is, or automatically converted to use LF. It used to be quite important to consider a. because many (older) Windows tools do not support the LF line ending well. This is however not a practical issue anymore; nowadays b. is the main consideration; since we (should) only want LF in the canonical repository on GitHub, a new file created on Windows must only be checked in with LF, not CRLF. Now consider the possible configurations: 1. `core.autocrlf = false` would allow CRLF to be committed. There are other ways to prevent CRLF from being present though, so this is not entirely out of consideration. I think there’s a pre-commit hook for this, for example. 2. `core.autocrlf = true` would always convert line endings to CRLF on checkout, and convert line endings (back) to LF on checkin. 3. `core.autocrlf = input` is a combination of the two; it does not convert on checkout (so the local copy in the filesystem would use LF), but would convert to LF on checkin. From a quick glance it’s easy to see option 2 and 3 seem more reasonable. The problem is however which files the conversions apply to. By default, Git would use a heuristic to figure out whether a file is text (and should be converted) or binary (and should not), but any heuristic is not perfect. Git therefore allows the repository to explicitly specify whether a file is text or not in `.gitattributes`, but still maintainers must be careful to review commits so no accidental checkins are made with wrongly detected file types. This is fortunately easy for Airflow specifically since we have a pretty solid review procedure, and GitHub’s interface makes the detection pretty easy. So what is the best solution anyway? There isn’t one. In an ideal world I’d say it’s best to use `core.autocrlf = false`, use pre-commit to ensure no CRLF is committed in, and tell all contributors using Windows to be careful with their editor settings. But that’s not very practical for a public project like Airflow (more so for private projects, where you can just tell everyone on the team to set things up correctly), so it’s fine to use either `true` or `input` in practice and rely on maintainers to review and detect any errors Git’s algorithm may make. A pre-commit hook can still work in that case, although probably not be worthwhile since explaining what that lint error means to a novice contributor is probably harder than just telling the contributor to add and entry in `.gitattributes` anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
