Hi Sean,

Thanks a lot for your careful review. I have answered your questions below, and 
I have few concerns. Please let me know if I am missing something:

This code is on the critical path for commands with very large lengths and this 
is a non-negligible time. For UNIX, for example, this would be triggered for 
commands spanning more than 2MB of arguments and Clang spends a good amount of 
time to parse even trivial arguments, because it is a huge number of arguments. 

Another important point is that, currently, the Command class stores arguments 
as a list of char *.

This is the reason for this design with lots of naked char *. It avoids 
creating StringRefs, that would measure the length of char* in their 
constructor and cause more overhead. It also avoids using high-level classes 
such as streams and bump allocators because, since this is likely to be called 
only when the number of arguments is very large, would cause them to repeatedly 
perform heap allocations until they finally settle to support the final size of 
the string. Either that or we would need to work with very large slabs, wasting 
memory. In this design, the length is calculated once and a single allocation 
is made, similar to what is done in Execute() in Support/Windows/Program.inc.

Therefore, by calculating the size of the final command line beforehand, even 
though it looks like we are losing by iterating over all args twice (one for 
length calculation and other for parsing), we win by eliminating inefficiencies 
in the allocation mechanism in runtime or in memory consumption.

Regards,
Rafael

================
Comment at: lib/Driver/Job.cpp:122
@@ +121,3 @@
+/// will escape the regular backslashes (used in Windows paths) and quotes.
+static std::unique_ptr<char[]> FlattenArgs(const char **args) {
+  // First, determine the length of the command line.
----------------
silvas wrote:
> This would be a lot nicer with ArrayRef<const char *>; that way, you won't 
> lose the length.
That's a good point. However, I am afraid that I would still need to 
recalculate the length by traversing each argument, calculating its length and 
assessing whether it needs quoting or not.

================
Comment at: lib/Driver/Job.cpp:169
@@ +168,3 @@
+  llvm::SmallVector<const char *, 128> NewArgv;
+  unsigned len = 0;
+  for (auto Input : Inputs) {
----------------
silvas wrote:
> Check your naming here and in other places.
Thanks for pointing out, I fixed the naming. Will submit the patch soon.

================
Comment at: lib/Driver/Tools.h:546-550
@@ +545,7 @@
+    llvm::sys::EncodingMethod getResponseFileEncoding() const override {
+#if defined(LLVM_ON_WIN32)
+      return llvm::sys::EM_CurrentCodePage;
+#else
+      return llvm::sys::EM_UTF8;
+#endif
+    }
----------------
silvas wrote:
> Ew... let's not have a bunch of random #ifdef's all over the code outside 
> libSupport.
I agree that this is inadequate. My previous patch version delegated this 
reasoning to libSupport. However, it was more difficult to understand and I 
changed it for clarity. Now, it is obvious that the response file encoding for 
this tool depends on whether it is running on Windows or not. In my previous 
libSupport version, I had enum members that meant "this is UTF on UNIX but 
UTF16 on Windows", and so on. I think this is not a good design because if you 
need to support a new tool, you may potentially need to add a new enum member 
in libSupport to encode your tuple <MyUNIXEncoding, MyWindowsEncoding>.

Since the encoding depends on the tool and on the operating system, I would 
appreciate if you have any suggestion on how to code this. Meanwhile, I will 
think in something.

http://reviews.llvm.org/D4897



_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to