Hi Sean,
Thanks a lot for your careful review. I have answered your questions below, and
I have few concerns. Please let me know if I am missing something:
This code is on the critical path for commands with very large lengths and this
is a non-negligible time. For UNIX, for example, this would be triggered for
commands spanning more than 2MB of arguments and Clang spends a good amount of
time to parse even trivial arguments, because it is a huge number of arguments.
Another important point is that, currently, the Command class stores arguments
as a list of char *.
This is the reason for this design with lots of naked char *. It avoids
creating StringRefs, that would measure the length of char* in their
constructor and cause more overhead. It also avoids using high-level classes
such as streams and bump allocators because, since this is likely to be called
only when the number of arguments is very large, would cause them to repeatedly
perform heap allocations until they finally settle to support the final size of
the string. Either that or we would need to work with very large slabs, wasting
memory. In this design, the length is calculated once and a single allocation
is made, similar to what is done in Execute() in Support/Windows/Program.inc.
Therefore, by calculating the size of the final command line beforehand, even
though it looks like we are losing by iterating over all args twice (one for
length calculation and other for parsing), we win by eliminating inefficiencies
in the allocation mechanism in runtime or in memory consumption.
Regards,
Rafael
================
Comment at: lib/Driver/Job.cpp:122
@@ +121,3 @@
+/// will escape the regular backslashes (used in Windows paths) and quotes.
+static std::unique_ptr<char[]> FlattenArgs(const char **args) {
+ // First, determine the length of the command line.
----------------
silvas wrote:
> This would be a lot nicer with ArrayRef<const char *>; that way, you won't
> lose the length.
That's a good point. However, I am afraid that I would still need to
recalculate the length by traversing each argument, calculating its length and
assessing whether it needs quoting or not.
================
Comment at: lib/Driver/Job.cpp:169
@@ +168,3 @@
+ llvm::SmallVector<const char *, 128> NewArgv;
+ unsigned len = 0;
+ for (auto Input : Inputs) {
----------------
silvas wrote:
> Check your naming here and in other places.
Thanks for pointing out, I fixed the naming. Will submit the patch soon.
================
Comment at: lib/Driver/Tools.h:546-550
@@ +545,7 @@
+ llvm::sys::EncodingMethod getResponseFileEncoding() const override {
+#if defined(LLVM_ON_WIN32)
+ return llvm::sys::EM_CurrentCodePage;
+#else
+ return llvm::sys::EM_UTF8;
+#endif
+ }
----------------
silvas wrote:
> Ew... let's not have a bunch of random #ifdef's all over the code outside
> libSupport.
I agree that this is inadequate. My previous patch version delegated this
reasoning to libSupport. However, it was more difficult to understand and I
changed it for clarity. Now, it is obvious that the response file encoding for
this tool depends on whether it is running on Windows or not. In my previous
libSupport version, I had enum members that meant "this is UTF on UNIX but
UTF16 on Windows", and so on. I think this is not a good design because if you
need to support a new tool, you may potentially need to add a new enum member
in libSupport to encode your tuple <MyUNIXEncoding, MyWindowsEncoding>.
Since the encoding depends on the tool and on the operating system, I would
appreciate if you have any suggestion on how to code this. Meanwhile, I will
think in something.
http://reviews.llvm.org/D4897
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits